Re: Indexing to Solr4.2 with nutch 1.6

Amit Sela Wed, 10 Apr 2013 08:01:56 -0700

I saw the patch for nutch 2.x where you replaced CommonsHttpSolrServer
with ConcurrentUpdateSolrServer but in 1.6
StringUtils.getCommonsHttpSolrServer is used for getting SolrServer.
Should we add a getConcurrentUpdateSolrServer to SolrUtils ?
As I understand it, the exception I got was caused by an empty result set
returned by SolrQuery... could it be because of using url as uniqueKey ?
I see in SolrDeleteDuplicates.java :
line: 226: solnrQuery.setFields(*SolrConstants.ID_FIELD*,
SolrConstants.BOOST_FIELD,
                          SolrConstants.TIMESTAMP_FIELD,
                          SolrConstants.DIGEST_FIELD);




On Tue, Apr 9, 2013 at 9:15 PM, Lewis John Mcgibbney <
[email protected]> wrote:

> Before we do the upgrade we need to consolidate all of these use cases.
> What criteria do we want to review and accept as the unique key? Will this
> change between Nutch trunk and 2.x?
>
> On Tuesday, April 9, 2013, Amit Sela <[email protected]> wrote:
> > Well, according to our other corresponding, the only thing I did
> different
> > in my schema.xml (schema-solr4.xml) before rebuilding nutch was the
> >  <uniqueKey>url</uniqueKey> instead of <uniqueKey>id</uniqueKey>.
> >
> > It all goes well until the dedup phase where the MapReduce throws:
> >
> > java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
> > at java.util.ArrayList.rangeCheck(ArrayList.java:604)
> > at java.util.ArrayList.get(ArrayList.java:382)
> > at
> >
>
> org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:268)
> > at
> >
>
> org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:241)
> > at
> >
>
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:236)
> > at
> >
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:216)
> > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
> > at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> > at
> >
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
> > at org.apache.hadoop.mapred.Child.main(Child.java:249)
> >
> > Thanks.
> >
> >
> > On Mon, Apr 8, 2013 at 10:33 PM, Lewis John Mcgibbney <
> > [email protected]> wrote:
> >
> >> I would probably be best to describe what you've tried here, possibly a
> >> paste of your schema, what you've done (if anything) to the Nutch source
> to
> >> get it working with Solr 4, etc.
> >> The stack trace you get would also be beneficial.
> >> Thank you
> >> Lewis
> >>
> >>
> >> On Mon, Apr 8, 2013 at 4:13 AM, Amit Sela <[email protected]> wrote:
> >>
> >> > Is it possible ? I saw a Jira open about connecting to SolrCloud via
> >> > ZooKeeper but in direct connection to one of the server is it possible
> to
> >> > index with nutch 1.6 into Solr4.2 setup as cloud with ZooKeeper
> ensemble
> >> ?
> >> > because I keep getting IndexOutOfBounds exceptions in the dedup M/R
> >> phase.
> >> >
> >> > Thanks.
> >> >
> >>
> >>
> >>
> >> --
> >> *Lewis*
> >>
> >
>
> --
> *Lewis*
>

Re: Indexing to Solr4.2 with nutch 1.6

Reply via email to