I downgraded nutch to lucene 2.9.1, moved the index directory out of the
way, and the solrindex process works now.

Next issue I am having is when the solrindex hits the autocommit point.

    <!-- Perform a <commit/> automatically under certain conditions:
         maxDocs - number of updates since last commit is greater than this
         maxTime - oldest uncommited update (in ms) is this long ago -->
    <autoCommit>
      <maxDocs>10000</maxDocs>
      <maxTime>10000</maxTime>
    </autoCommit>


When this starts occurring, it slows the process down to a crawl because it
is single threaded. I am not sure if the issue is the solrindexer is single
threaded or the glassfish java application server is set up to be single
threaded. I need to look glassfish documentation.


On Thu, Nov 4, 2010 at 6:30 PM, Steve Cohen <[email protected]> wrote:

> I noticed that solr is using lucene 2.9.1
>
> /opt/solr/lib/lucene-core-2.9.1.jar
> /opt/solr/lib/lucene-queries-2.9.1.jar
> /opt/solr/lib/lucene-memory-2.9.1.jar
> /opt/solr/lib/lucene-spellchecker-2.9.1.jar
> /opt/solr/lib/lucene-highlighter-2.9.1.jar
> /opt/solr/lib/lucene-analyzers-2.9.1.jar
> /opt/solr/lib/lucene-misc-2.9.1.jar
> /opt/solr/lib/lucene-snowball-2.9.1.jar
>
> and nutch is using lucene 3.0.1
>
> /opt/nutch/lib/lucene-misc-3.0.1.jar
> /opt/nutch/lib/lucene-core-3.0.1.jar
>
> I found this
> http://osdir.com/ml/solr-user.lucene.apache.org/2010-08/msg00907.html
>
> "*Solr 1.4 can read Lucene 2.9 index or older."
>
> *So my  question is can I just grab the 2.9.1 jar files from solr and
> throw them into nutch? I quess I can test that and find out.
>
>
> On Thu, Nov 4, 2010 at 4:49 PM, Steve Cohen <[email protected]> wrote:
>
>> I got the error when I ran solrindex. It finished the map tasks and gave
>> me the error when it started on the reduce tasks.
>>
>>
>> On Thu, Nov 4, 2010 at 4:42 PM, Alexander Aristov <
>> [email protected]> wrote:
>>
>>> I am pretty sure you see problems with different lucene jars versions.
>>>
>>> Newer solr shall be able to convert old index into a new one. Why not to
>>> try
>>> making fresh index?
>>>
>>> Best Regards
>>> Alexander Aristov
>>>
>>>
>>> On 4 November 2010 23:39, Steve Cohen <[email protected]> wrote:
>>>
>>> > Nutch 1.0 comes with solr 1.3 and we were running solr 1.3 using
>>> glassfish
>>> > as the java application server. I tried upgrading solr to 1.4 but now
>>> it is
>>> > also giving me the "incompatible format version:" error when I
>>> relaunched
>>> > it.
>>> >
>>> > I see this from the solr faq:
>>> >
>>> > "What does "CorruptIndexException: Unknown format version" mean ?
>>> >
>>> > This happens when the Lucene code in Solr used to read the index files
>>> from
>>> > disk encounters index files in a format it doesn't recognize.
>>> > The most common cause is from using a version of Solr+Lucene that is
>>> older
>>> > then the version used to create that index. "
>>> >
>>> > But I don't know if unknown format version and incompatible format
>>> version
>>> > are the same issue. Is there an easy way to recreate the index? I don't
>>> > need
>>> > the data.
>>> >
>>> > Thanks,
>>> > Steve Cohen
>>> >
>>> > On Thu, Nov 4, 2010 at 2:40 PM, Markus Jelsma <
>>> [email protected]
>>> > >wrote:
>>> >
>>> > > What version of Solr are you indexing to and what is the Solr log
>>> telling
>>> > > you?
>>> > >
>>> > >
>>> > > > Hello I am trying to get nutch to work after upgrading from nutch
>>> 1.0
>>> > to
>>> > > > 1.2:
>>> > > >
>>> > > > solrindex map is working but as soon as I hit the reduce stage I
>>> start
>>> > > > getting errors. I fixed a couple of the errors but I don't
>>> understand
>>> > > what
>>> > > > the latest one means:
>>> > > >
>>> > > > Task TASKID="task_201011031547_0001_m_000153" TASK_TYPE="MAP"
>>> > > > TASK_STATUS="SUCCESS" FINISH_TIME="1288814131303"
>>> > > > COUNTERS="{(FileSystemCounters)(FileSystemCou
>>> > > >
>>> > >
>>> >
>>> nters)[(FILE_BYTES_READ)(FILE_BYTES_READ)(138)][(FILE_BYTES_WRITTEN)(FILE_B
>>> > > >
>>> > >
>>> >
>>> YTES_WRITTEN)(218)]}{(org\.apache\.hadoop\.mapred\.Task$Counter)(Map-Reduce
>>> > > > Framew
>>> > > > ork)[(COMBINE_OUTPUT_RECORDS)(Combine output
>>> > > > records)(0)][(MAP_INPUT_RECORDS)(Map input
>>> > > > records)(0)][(SPILLED_RECORDS)(Spilled
>>> Records)(0)][(MAP_OUTPUT_BYTES
>>> > > > )(Map output bytes)(0)][(MAP_INPUT_BYTES)(Map input
>>> > > > bytes)(0)][(COMBINE_INPUT_RECORDS)(Combine input
>>> > > > records)(0)][(MAP_OUTPUT_RECORDS)(Map output records)(0)
>>> > > > ]}" .
>>> > > > ReduceAttempt TASK_TYPE="REDUCE"
>>> > TASKID="task_201011031547_0001_r_000000"
>>> > > > TASK_ATTEMPT_ID="attempt_201011031547_0001_r_000000_0"
>>> > > > START_TIME="1288813765434" T
>>> > > > RACKER_NAME="tracker_search1:localhost/127\.0\.0\.1:34577"
>>> > > > HTTP_PORT="50060" .
>>> > > > ReduceAttempt TASK_TYPE="REDUCE"
>>> > TASKID="task_201011031547_0001_r_000000"
>>> > > > TASK_ATTEMPT_ID="attempt_201011031547_0001_r_000000_0"
>>> > > TASK_STATUS="FAILED"
>>> > > > FINISH_
>>> > > > TIME="1288814153180" HOSTNAME="search1"
>>> > > > ERROR="org\.apache\.solr\.common\.SolrException: Severe errors in
>>> solr
>>> > > > configuration\.  Check your log files for more
>>> > > >  detailed information on what may be wrong\.  If you want solr to
>>> > > continue
>>> > > > after configuration errors, change:
>>> > > > <abortOnConfigurationError>false</abortOnCo
>>> > > > nfigurationError>  in null
>>> > > > -------------------------------------------------------------
>>> > > > java\.lang\.RuntimeException:
>>> > > > org\.apache\.lucene\.index\.CorruptIn
>>> > > > dexException: Incompatible format version: 2 expected 1 or lower
>>> > >  at
>>> > > > org\.apache\.solr\.core\.SolrCore\.getSearcher(SolrCore\.java:1068)
>>>  at
>>> > > > org\.apach
>>> > > > e\.solr\.core\.SolrCore\.<init>(SolrCore\.java:579)     at
>>> > > >
>>> > >
>>> >
>>> org\.apache\.solr\.core\.CoreContainer$Initializer\.initialize(CoreContaine
>>> > > > r\.java:137) at or
>>> > > >
>>> > >
>>> >
>>> g\.apache\.solr\.servlet\.SolrDispatchFilter\.init(SolrDispatchFilter\.java
>>> > > > :83) at
>>> > > org\.apache\.catalina\.core\.ApplicationFilterConfig\.getFilter(Ap
>>> > > > plicationFilterConfig\.java:259)        at
>>> > > >
>>> > >
>>> >
>>> org\.apache\.catalina\.core\.ApplicationFilterChain\.internalDoFilter(Appli
>>> > > > cationFilterChain\.java:237) at or
>>> > > >
>>> > >
>>> >
>>> g\.apache\.catalina\.core\.ApplicationFilterChain\.doFilter(ApplicationFilt
>>> > > > erChain\.java:215) at
>>> > > > org\.apache\.catalina\.core\.StandardWrapperValve\.invoke(
>>> > > > StandardWrapperValve\.java:277)         at
>>> > > >
>>> > >
>>> >
>>> org\.apache\.catalina\.core\.StandardContextValve\.invoke(StandardContextVa
>>> > > > lve\.java:188) at org\.apache\.catal
>>> > > > ina\.core\.StandardPipeline\.invoke(StandardPipeline\.java:641)
>>> > > at
>>> > > >
>>> com\.sun\.enterprise\.web\.WebPipeline\.invoke(WebPipeline\.java:97)
>>> > > > at co
>>> > > >
>>> > >
>>> >
>>> m\.sun\.enterprise\.web\.PESessionLockingStandardPipeline\.invoke(PESession
>>> > > > LockingStandardPipeline\.java:85) at
>>> > > > org\.apache\.catalina\.core\.StandardHostV
>>> > > > alve\.invoke(StandardHostValve\.java:185)       at
>>> > > >
>>> > >
>>> >
>>> org\.apache\.catalina\.connector\.CoyoteAdapter\.doService(CoyoteAdapter\.j
>>> > > > ava:332) at org\.apache\.catal
>>> > > > ina\.connector\.CoyoteAdapter\.service(CoyoteAdapter\.java:233)
>>> > > at
>>> > > >
>>> > >
>>> >
>>> com\.sun\.enterprise\.v3\.services\.impl\.ContainerMapper\.service(Containe
>>> > > > rMapper \.java:165)     at
>>> > > >
>>> > >
>>> >
>>> com\.sun\.grizzly\.http\.ProcessorTask\.invokeAdapter(ProcessorTask\.java:7
>>> > > > 91) at com\.sun\.grizzly\.http\.ProcessorTask\.doProcess(
>>> > > > ProcessorTask\.java:693)        at
>>> > > >
>>> >
>>> com\.sun\.grizzly\.http\.ProcessorTask\.process(ProcessorTask\.java:954)
>>> > > > at com\.sun\.grizzly\.http\.DefaultProtocolFi
>>> > > > lter\.execute(DefaultProtocolFilter\.java:170)  at
>>> > > >
>>> > >
>>> >
>>> com\.sun\.grizzly\.DefaultProtocolChain\.executeProtocolFilter(DefaultProto
>>> > > > colChain\.java:135) at co
>>> > > >
>>> > >
>>> >
>>> m\.sun\.grizzly\.DefaultProtocolChain\.execute(DefaultProtocolChain\.java:1
>>> > > > 02) at
>>> > > >
>>> > >
>>> >
>>> com\.sun\.grizzly\.DefaultProtocolChain\.execute(DefaultProtocolChain\.jav
>>> > > > a:88)   at
>>> > > >
>>> > >
>>> >
>>> com\.sun\.grizzly\.http\.HttpProtocolChain\.execute(HttpProtocolChain\.java
>>> > > > :76) at com\.sun\.grizzly\.ProtocolChainContextTask\.doCall(Protoc
>>> > > > olChainContextTask\.java:53)    at
>>> > > >
>>> > >
>>> >
>>> com\.sun\.grizzly\.SelectionKeyContextTask\.call(SelectionKeyContextTask\.j
>>> > > > ava:57) at com\.sun\.grizzly\.ContextTask\.ru
>>> > > > n(ContextTask\.java:69)         at
>>> > > >
>>> > >
>>> >
>>> com\.sun\.grizzly\.util\.AbstractThreadPool$Worker\.doWork(AbstractThreadPo
>>> > > > ol\.java:330) at com\.sun\.grizzly\.util\.A
>>> > > > bstractThreadPool$Worker\.run(AbstractThreadPool\.java:309)     at
>>> > > > java\.lang\.Thread\.run(Thread\.java:619) Caused by:
>>> > > > org\.apache\.lucene\.index\.CorruptIn
>>> > > > dexException: Incompatible format version: 2 expected 1 or lower
>>> > >  at
>>> > > >
>>> >
>>> org\.apache\.lucene\.index\.FieldsReader\.<init>(FieldsReader\.java:117)
>>> > > > at or
>>> > > >
>>> > >
>>> >
>>> g\.apache\.lucene\.index\.SegmentReader$CoreReaders\.openDocStores(SegmentR
>>> > > > eader\.java:277) at
>>> > > > org\.apache\.lucene\.index\.SegmentReader\.get(SegmentRead
>>> > > > er\.java:640)   at
>>> > > >
>>> org\.apache\.lucene\.index\.SegmentReader\.get(SegmentReader\.java:599)
>>> > > > at org\.apache\.lucene\.index\.DirectoryReader\.<init>(Direct
>>> > > > oryReader\.java:104)    at
>>> > > >
>>> > >
>>> >
>>> org\.apache\.lucene\.index\.ReadOnlyDirectoryReader\.<init>(ReadOnlyDirecto
>>> > > > ryReader\.java:27) at org\.apache\.lucene\.index
>>> > > > \.DirectoryReader$1\.doBody(DirectoryReader\.java:74)   at
>>> > > >
>>> > >
>>> >
>>> org\.apache\.lucene\.index\.SegmentInfos$FindSegmentsFile\.run(SegmentInfos
>>> > > > \.java:704) at or
>>> > > >
>>> >
>>> g\.apache\.lucene\.index\.DirectoryReader\.open(DirectoryReader\.java:69)
>>> > > > at
>>> org\.apache\.lucene\.index\.IndexReader\.open(IndexReader\.java:476)
>>> > > >         at
>>> > > >
>>> org\.apache\.lucene\.index\.IndexReader\.open(IndexReader\.java:403)
>>> > > > at
>>> org\.apache\.solr\.core\.StandardIndexReaderFactory\.newReader(Sta
>>> > > > ndardIndexReaderFactory\.java:38)       at
>>> > > > org\.apache\.solr\.core\.SolrCore\.getSearcher(SolrCore\.java:1057)
>>> > >  \.\.\.
>>> > > > 29 more
>>> > > >
>>> > > > Would someone help me out?
>>> > > >
>>> > > > Thanks,
>>> > > > Steve Cohen
>>> > >
>>> >
>>>
>>
>>
>

Reply via email to