Re: Help needed in breaking large index file into smaller ones

2017-01-09 Thread Manan Sheth
Additionally to answer Anshum's queries,

We are currently using Solr 4.10 and planning to upgrade to Solr 6.2.1 and 
upgradation process in creating the current problem.

We are using it in SolrCloud with 8-10 shards split on different nodes each 
having segment size ~30 GB for some collection and ranging 10-12 GB across 
board.

This is due to performance and partial lack of large RAM (currently ~32 
GB/node).

Yes, we want data together in single collection.

Thanks,
Manan Sheth

From: Manan Sheth <manan.sh...@impetus.co.in>
Sent: Tuesday, January 10, 2017 10:51 AM
To: solr-user
Subject: Re: Help needed in breaking large index file into smaller ones

Hi Erick,

Its due to some past issues observed with Joins on Solr 4, which got OOM on 
joining to large indexes after optimization/compaction, if those are stored as 
smaller files those gets fit into memory and operations are performed 
appropriately. Also, there are slow write/commit/updates are observed for large 
files. Thus, to minimize this risk while upgrading on Solr 6, we wanted to 
store indexes into smaller sized files.

Thanks,
Manan Sheth

From: Erick Erickson <erickerick...@gmail.com>
Sent: Tuesday, January 10, 2017 5:24 AM
To: solr-user
Subject: Re: Help needed in breaking large index file into smaller ones

Why do you have a requirement that the indexes be < 4G? If it's
arbitrarily imposed why bother?

Or is it a non-negotiable requirement imposed by the platform you're on?

Because just splitting the files into a smaller set won't help you if
you then start to index into it, the merge process will just recreate
them.

You might be able to do something with the settings in
TieredMergePolicy in the first place to stop generating files > 4g..

Best,
Erick

On Mon, Jan 9, 2017 at 3:27 PM, Anshum Gupta <ans...@anshumgupta.net> wrote:
> Can you provide more information about:
> - Are you using Solr in standalone or SolrCloud mode? What version of Solr?
> - Why do you want this? Lack of disk space? Uneven distribution of data on
> shards?
> - Do you want this data together i.e. as part of a single collection?
>
> You can check out the following APIs:
> SPLITSHARD:
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3
> MIGRATE:
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api12
>
> Among other things, make sure you have enough spare disk-space before
> trying out the SPLITSHARD API in particular.
>
> -Anshum
>
>
>
> On Mon, Jan 9, 2017 at 12:08 PM Mikhail Khludnev <m...@apache.org> wrote:
>
>> Perhaps you can copy this index into a separate location. Remove odd and
>> even docs into former and later indexes consequently, and then force merge
>> to single segment in both locations separately.
>> Perhaps shard splitting in SolrCloud does something like that.
>>
>> On Mon, Jan 9, 2017 at 1:12 PM, Narsimha Reddy CHALLA <
>> chnredd...@gmail.com>
>> wrote:
>>
>> > Hi All,
>> >
>> >   My solr server has a few large index files (say ~10G). I am looking
>> > for some help on breaking them it into smaller ones (each < 4G) to
>> satisfy
>> > my application requirements. Are there any such tools available?
>> >
>> > Appreciate your help.
>> >
>> > Thanks
>> > NRC
>> >
>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>








NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.








NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Re: Help needed in breaking large index file into smaller ones

2017-01-09 Thread Manan Sheth
Hi Erick,

Its due to some past issues observed with Joins on Solr 4, which got OOM on 
joining to large indexes after optimization/compaction, if those are stored as 
smaller files those gets fit into memory and operations are performed 
appropriately. Also, there are slow write/commit/updates are observed for large 
files. Thus, to minimize this risk while upgrading on Solr 6, we wanted to 
store indexes into smaller sized files.

Thanks,
Manan Sheth

From: Erick Erickson <erickerick...@gmail.com>
Sent: Tuesday, January 10, 2017 5:24 AM
To: solr-user
Subject: Re: Help needed in breaking large index file into smaller ones

Why do you have a requirement that the indexes be < 4G? If it's
arbitrarily imposed why bother?

Or is it a non-negotiable requirement imposed by the platform you're on?

Because just splitting the files into a smaller set won't help you if
you then start to index into it, the merge process will just recreate
them.

You might be able to do something with the settings in
TieredMergePolicy in the first place to stop generating files > 4g..

Best,
Erick

On Mon, Jan 9, 2017 at 3:27 PM, Anshum Gupta <ans...@anshumgupta.net> wrote:
> Can you provide more information about:
> - Are you using Solr in standalone or SolrCloud mode? What version of Solr?
> - Why do you want this? Lack of disk space? Uneven distribution of data on
> shards?
> - Do you want this data together i.e. as part of a single collection?
>
> You can check out the following APIs:
> SPLITSHARD:
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3
> MIGRATE:
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api12
>
> Among other things, make sure you have enough spare disk-space before
> trying out the SPLITSHARD API in particular.
>
> -Anshum
>
>
>
> On Mon, Jan 9, 2017 at 12:08 PM Mikhail Khludnev <m...@apache.org> wrote:
>
>> Perhaps you can copy this index into a separate location. Remove odd and
>> even docs into former and later indexes consequently, and then force merge
>> to single segment in both locations separately.
>> Perhaps shard splitting in SolrCloud does something like that.
>>
>> On Mon, Jan 9, 2017 at 1:12 PM, Narsimha Reddy CHALLA <
>> chnredd...@gmail.com>
>> wrote:
>>
>> > Hi All,
>> >
>> >   My solr server has a few large index files (say ~10G). I am looking
>> > for some help on breaking them it into smaller ones (each < 4G) to
>> satisfy
>> > my application requirements. Are there any such tools available?
>> >
>> > Appreciate your help.
>> >
>> > Thanks
>> > NRC
>> >
>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>








NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Re: Help needed in breaking large index file into smaller ones

2017-01-09 Thread Manan Sheth
Is this really works for lucene index files?

Thanks,
Manan Sheth

From: Moenieb Davids <moenieb.dav...@gpaa.gov.za>
Sent: Monday, January 9, 2017 7:36 PM
To: solr-user@lucene.apache.org
Subject: RE: Help needed in breaking large index file into smaller ones

Hi,

Try split on linux or unix

split -l 100 originalfile.csv
this will split a file into 100 lines each

see other options for how to split like size


-Original Message-
From: Narsimha Reddy CHALLA [mailto:chnredd...@gmail.com]
Sent: 09 January 2017 12:12 PM
To: solr-user@lucene.apache.org
Subject: Help needed in breaking large index file into smaller ones

Hi All,

  My solr server has a few large index files (say ~10G). I am looking for 
some help on breaking them it into smaller ones (each < 4G) to satisfy my 
application requirements. Are there any such tools available?

Appreciate your help.

Thanks
NRC










===
GPAA e-mail Disclaimers and confidential note

This e-mail is intended for the exclusive use of the addressee only.
If you are not the intended recipient, you should not use the contents
or disclose them to any other person. Please notify the sender immediately
and delete the e-mail. This e-mail is not intended nor
shall it be taken to create any legal relations, contractual or otherwise.
Legally binding obligations can only arise for the GPAA by means of
a written instrument signed by an authorised signatory.
===








NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


IndexWriter.forceMerge not working as desired

2017-01-09 Thread Manan Sheth
Hi All,


While doing index merging through IndexWriter.forceMerge method in solr 6.2.1, 
I am passing the argument as 30, but it is still merging all the data (earlier 
collection use to have 10 segments) into single segment. Please provide some 
information in understading the behaviour.


Thanks,

Manan Sheth








NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Re: Help needed in breaking large solr index file into smaller ones

2017-01-09 Thread Manan Sheth
Hi All,

I have a problem simillar to this one, where the indexes in multiple solr 
shards has created large index files (~10 GB each) and wanted to split this 
large file on each shard into smaller files.

Please provide some guidelines.

Thanks,
Manan Sheth

From: Narsimha Reddy CHALLA <chnredd...@gmail.com>
Sent: Monday, January 9, 2017 3:51 PM
To: solr-user@lucene.apache.org
Subject: Help needed in breaking large solr index file into smaller ones

Hi All,

  My solr server has a few large index files (say ~10G). I am looking
for some help on breaking them it into smaller ones (each < 4G) to satisfy
my application requirements. Basically, I am not looking for any
optimization of index here (ex: optimize, expungeDeletes etc.).

Are there any such tools available?

Appreciate your help.

Thanks
NRC








NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Solr Index upgradation Merging issue observed

2017-01-08 Thread Manan Sheth
Hi All,


Currently, We are in process of upgrading existing Solr indexes from Solr 4.x 
to Solr 6.2.1. In order to upgrade existing indexes we are planning to use 
IndexUpgrader class in sequential manner from Solr 4.x to Solr 5.x and Solr 5.x 
to Solr 6.2.1.


While preforming the upgrdation a strange behaviour is noticed where all the 
previous segments are getting merged to one single large segment. We need to 
preserve the original segments as single large segment is getting bulkier (~ 
2-4 TBs).


Please let me know how to tune the process or write custom logic to overcome it.


Thanks,

Manan Sheth








NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Re: Solr MapReduce Indexer Tool is failing for empty core name.

2017-01-01 Thread Manan Sheth
Hi All,

Please help me out if anyone has executed solr map reduce indexer tool with 
solr 6. This is still failing and no hints for the error shown in below mail 
thread.

Thanks,
Manan Sheth

From: Manan Sheth
Sent: Friday, December 16, 2016 2:52 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr MapReduce Indexer Tool is failing for empty core name.

Thats what I presume and it should start utilizing the collection only. The 
collection param has already been specified and it should take all details from 
there only. also, core to collection change was happed in solr 4. The map 
reduce inderexer for solr 4.10 is working correctly with this, but not for solr 
6.

From: Reth RM <reth.ik...@gmail.com>
Sent: Friday, December 16, 2016 12:45 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr MapReduce Indexer Tool is failing for empty core name.

The primary difference has been solr to solr-cloud in later version,
starting from solr4.0  And what happens if you try starting solr in stand
alone mode, solr cloud does not consider 'core' anymore, it considers
'collection' as param.


On Thu, Dec 15, 2016 at 11:05 PM, Manan Sheth <manan.sh...@impetus.co.in>
wrote:

> Thanks Reth. As noted this is the same map reduce based indexer tool that
> comes shipped with the solr distribution by default.
>
> It only take the zk_host details and extracts all required information
> from there only. It does not have core specific configurations. The same
> tool released with solr 4.10 distro is working correctly, it seems to be
> some issue/ changes from solr 5 onwards. I have tested it for both solr 5.5
> & solr 6.2.1 and the behaviour remains same for both.
>
> Thanks,
> Manan Sheth
> 
> From: Reth RM <reth.ik...@gmail.com>
> Sent: Friday, December 16, 2016 12:21 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr MapReduce Indexer Tool is failing for empty core name.
>
> It looks like command line tool that you are using to initiate index
> process,  is expecting some name to solr-core with respective command line
> param. use -help on the command line tool that you are using and check the
> solr-core-name parameter key, pass that also with some value.
>
>
> On Tue, Dec 13, 2016 at 5:44 AM, Manan Sheth <manan.sh...@impetus.co.in>
> wrote:
>
> > Hi All,
> >
> >
> > While working on a migration project from Solr 4 to Solr 6, I need to
> > reindex my data using Solr map reduce Indexer tool in offline mode with
> > avro data.
> >
> > While executing the map reduce indexer tool shipped with solr 6.2.1, it
> is
> > throwing error of cannot create core with empty name value. The solr
> > instances are running fine with new indexed are being added and modified
> > correctly. Below is the command that was being fired:
> >
> >
> > hadoop --config /etc/hadoop/conf jar /home/impadmin/solr-6.2.1/
> dist/solr-map-reduce-*.jar
> > -D 'mapred.child.java.opts=-Xmx500m' \
> >-libjars `echo /home/impadmin/solr6lib/*.jar | sed 's/ /,/g'`
> > --morphline-file /home/impadmin/app_quotes_morphline_actual.conf \
> >--zk-host 172.26.45.71:9984 --output-dir hdfs://
> > impetus-i0056.impetus.co.in:8020/user/impadmin/
> > MapReduceIndexerTool/output5 \
> >--collection app.quotes --log4j src/test/resources/log4j.
> properties
> > --verbose \
> >  "hdfs://impetus-i0056.impetus.co.in:8020/user/impadmin/
> > MapReduceIndexerTool/5d63e0f8-afc1-483e-bd3f-d508c885d794-00"
> >
> >
> > Below is the complete snapshot of error trace:
> >
> >
> > Failed to initialize record writer for org.apache.solr.hadoop.
> > MapReduceIndexerTool/MorphlineMapper, attempt_1479795440861_0343_r_
> > 00_0
> > at org.apache.solr.hadoop.SolrRecordWriter.(
> > SolrRecordWriter.java:128)
> > at org.apache.solr.hadoop.SolrOutputFormat.getRecordWriter(
> > SolrOutputFormat.java:163)
> > at org.apache.hadoop.mapred.ReduceTask$
> NewTrackingRecordWriter.
> > (ReduceTask.java:540)
> > at org.apache.hadoop.mapred.ReduceTask.runNewReducer(
> > ReduceTask.java:614)
> > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
> > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:422)
> > at org.apache.hadoop.security.UserGroupInformation.doAs(
> > UserGroupInformation.java:1709)
> > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java

Re: Solr MapReduce Indexer Tool is failing for empty core name.

2016-12-16 Thread Manan Sheth
Thats what I presume and it should start utilizing the collection only. The 
collection param has already been specified and it should take all details from 
there only. also, core to collection change was happed in solr 4. The map 
reduce inderexer for solr 4.10 is working correctly with this, but not for solr 
6.

From: Reth RM <reth.ik...@gmail.com>
Sent: Friday, December 16, 2016 12:45 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr MapReduce Indexer Tool is failing for empty core name.

The primary difference has been solr to solr-cloud in later version,
starting from solr4.0  And what happens if you try starting solr in stand
alone mode, solr cloud does not consider 'core' anymore, it considers
'collection' as param.


On Thu, Dec 15, 2016 at 11:05 PM, Manan Sheth <manan.sh...@impetus.co.in>
wrote:

> Thanks Reth. As noted this is the same map reduce based indexer tool that
> comes shipped with the solr distribution by default.
>
> It only take the zk_host details and extracts all required information
> from there only. It does not have core specific configurations. The same
> tool released with solr 4.10 distro is working correctly, it seems to be
> some issue/ changes from solr 5 onwards. I have tested it for both solr 5.5
> & solr 6.2.1 and the behaviour remains same for both.
>
> Thanks,
> Manan Sheth
> 
> From: Reth RM <reth.ik...@gmail.com>
> Sent: Friday, December 16, 2016 12:21 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr MapReduce Indexer Tool is failing for empty core name.
>
> It looks like command line tool that you are using to initiate index
> process,  is expecting some name to solr-core with respective command line
> param. use -help on the command line tool that you are using and check the
> solr-core-name parameter key, pass that also with some value.
>
>
> On Tue, Dec 13, 2016 at 5:44 AM, Manan Sheth <manan.sh...@impetus.co.in>
> wrote:
>
> > Hi All,
> >
> >
> > While working on a migration project from Solr 4 to Solr 6, I need to
> > reindex my data using Solr map reduce Indexer tool in offline mode with
> > avro data.
> >
> > While executing the map reduce indexer tool shipped with solr 6.2.1, it
> is
> > throwing error of cannot create core with empty name value. The solr
> > instances are running fine with new indexed are being added and modified
> > correctly. Below is the command that was being fired:
> >
> >
> > hadoop --config /etc/hadoop/conf jar /home/impadmin/solr-6.2.1/
> dist/solr-map-reduce-*.jar
> > -D 'mapred.child.java.opts=-Xmx500m' \
> >-libjars `echo /home/impadmin/solr6lib/*.jar | sed 's/ /,/g'`
> > --morphline-file /home/impadmin/app_quotes_morphline_actual.conf \
> >--zk-host 172.26.45.71:9984 --output-dir hdfs://
> > impetus-i0056.impetus.co.in:8020/user/impadmin/
> > MapReduceIndexerTool/output5 \
> >--collection app.quotes --log4j src/test/resources/log4j.
> properties
> > --verbose \
> >  "hdfs://impetus-i0056.impetus.co.in:8020/user/impadmin/
> > MapReduceIndexerTool/5d63e0f8-afc1-483e-bd3f-d508c885d794-00"
> >
> >
> > Below is the complete snapshot of error trace:
> >
> >
> > Failed to initialize record writer for org.apache.solr.hadoop.
> > MapReduceIndexerTool/MorphlineMapper, attempt_1479795440861_0343_r_
> > 00_0
> > at org.apache.solr.hadoop.SolrRecordWriter.(
> > SolrRecordWriter.java:128)
> > at org.apache.solr.hadoop.SolrOutputFormat.getRecordWriter(
> > SolrOutputFormat.java:163)
> > at org.apache.hadoop.mapred.ReduceTask$
> NewTrackingRecordWriter.
> > (ReduceTask.java:540)
> > at org.apache.hadoop.mapred.ReduceTask.runNewReducer(
> > ReduceTask.java:614)
> > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
> > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:422)
> > at org.apache.hadoop.security.UserGroupInformation.doAs(
> > UserGroupInformation.java:1709)
> > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> > Caused by: org.apache.solr.common.SolrException: Cannot create core with
> > empty name value
> > at org.apache.solr.core.CoreDescriptor.checkPropertyIsNotEmpty(
> > CoreDescriptor.java:280)
> > at org.apache.solr.core.CoreDescriptor.(
> CoreDescriptor.java:191)
> > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:754)
> &g

Re: Solr MapReduce Indexer Tool is failing for empty core name.

2016-12-15 Thread Manan Sheth
Thanks Reth. As noted this is the same map reduce based indexer tool that comes 
shipped with the solr distribution by default.

It only take the zk_host details and extracts all required information from 
there only. It does not have core specific configurations. The same tool 
released with solr 4.10 distro is working correctly, it seems to be some issue/ 
changes from solr 5 onwards. I have tested it for both solr 5.5 & solr 6.2.1 
and the behaviour remains same for both.

Thanks,
Manan Sheth

From: Reth RM <reth.ik...@gmail.com>
Sent: Friday, December 16, 2016 12:21 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr MapReduce Indexer Tool is failing for empty core name.

It looks like command line tool that you are using to initiate index
process,  is expecting some name to solr-core with respective command line
param. use -help on the command line tool that you are using and check the
solr-core-name parameter key, pass that also with some value.


On Tue, Dec 13, 2016 at 5:44 AM, Manan Sheth <manan.sh...@impetus.co.in>
wrote:

> Hi All,
>
>
> While working on a migration project from Solr 4 to Solr 6, I need to
> reindex my data using Solr map reduce Indexer tool in offline mode with
> avro data.
>
> While executing the map reduce indexer tool shipped with solr 6.2.1, it is
> throwing error of cannot create core with empty name value. The solr
> instances are running fine with new indexed are being added and modified
> correctly. Below is the command that was being fired:
>
>
> hadoop --config /etc/hadoop/conf jar 
> /home/impadmin/solr-6.2.1/dist/solr-map-reduce-*.jar
> -D 'mapred.child.java.opts=-Xmx500m' \
>-libjars `echo /home/impadmin/solr6lib/*.jar | sed 's/ /,/g'`
> --morphline-file /home/impadmin/app_quotes_morphline_actual.conf \
>--zk-host 172.26.45.71:9984 --output-dir hdfs://
> impetus-i0056.impetus.co.in:8020/user/impadmin/
> MapReduceIndexerTool/output5 \
>--collection app.quotes --log4j src/test/resources/log4j.properties
> --verbose \
>  "hdfs://impetus-i0056.impetus.co.in:8020/user/impadmin/
> MapReduceIndexerTool/5d63e0f8-afc1-483e-bd3f-d508c885d794-00"
>
>
> Below is the complete snapshot of error trace:
>
>
> Failed to initialize record writer for org.apache.solr.hadoop.
> MapReduceIndexerTool/MorphlineMapper, attempt_1479795440861_0343_r_
> 00_0
> at org.apache.solr.hadoop.SolrRecordWriter.(
> SolrRecordWriter.java:128)
> at org.apache.solr.hadoop.SolrOutputFormat.getRecordWriter(
> SolrOutputFormat.java:163)
> at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.
> (ReduceTask.java:540)
> at org.apache.hadoop.mapred.ReduceTask.runNewReducer(
> ReduceTask.java:614)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1709)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> Caused by: org.apache.solr.common.SolrException: Cannot create core with
> empty name value
> at org.apache.solr.core.CoreDescriptor.checkPropertyIsNotEmpty(
> CoreDescriptor.java:280)
> at org.apache.solr.core.CoreDescriptor.(CoreDescriptor.java:191)
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:754)
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:742)
> at org.apache.solr.hadoop.SolrRecordWriter.createEmbeddedSolrServer(
> SolrRecordWriter.java:163)
> at 
> org.apache.solr.hadoop.SolrRecordWriter.(SolrRecordWriter.java:121)
> ... 9 more
>
> Additional points to note:
>
>
>   *   The solrconfig and schema files are copied as is from Solr 4.
>   *   Once collection is deployed, user can perform all operations on the
> collection without any issue.
>   *   The indexation process is working fine with the same tool on Solr 4.
>
> Please help.
>
>
> Thanks,
>
> Manan Sheth
>
> 
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>


Solr MapReduce Indexer Tool is failing for empty core name.

2016-12-13 Thread Manan Sheth
Hi All,


While working on a migration project from Solr 4 to Solr 6, I need to reindex 
my data using Solr map reduce Indexer tool in offline mode with avro data.

While executing the map reduce indexer tool shipped with solr 6.2.1, it is 
throwing error of cannot create core with empty name value. The solr instances 
are running fine with new indexed are being added and modified correctly. Below 
is the command that was being fired:


hadoop --config /etc/hadoop/conf jar 
/home/impadmin/solr-6.2.1/dist/solr-map-reduce-*.jar -D 
'mapred.child.java.opts=-Xmx500m' \
   -libjars `echo /home/impadmin/solr6lib/*.jar | sed 's/ /,/g'` 
--morphline-file /home/impadmin/app_quotes_morphline_actual.conf \
   --zk-host 172.26.45.71:9984 --output-dir 
hdfs://impetus-i0056.impetus.co.in:8020/user/impadmin/MapReduceIndexerTool/output5
 \
   --collection app.quotes --log4j src/test/resources/log4j.properties 
--verbose \
 
"hdfs://impetus-i0056.impetus.co.in:8020/user/impadmin/MapReduceIndexerTool/5d63e0f8-afc1-483e-bd3f-d508c885d794-00"


Below is the complete snapshot of error trace:


Failed to initialize record writer for 
org.apache.solr.hadoop.MapReduceIndexerTool/MorphlineMapper, 
attempt_1479795440861_0343_r_00_0
at org.apache.solr.hadoop.SolrRecordWriter.(SolrRecordWriter.java:128)
at 
org.apache.solr.hadoop.SolrOutputFormat.getRecordWriter(SolrOutputFormat.java:163)
at 
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.(ReduceTask.java:540)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:614)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: org.apache.solr.common.SolrException: Cannot create core with empty 
name value
at 
org.apache.solr.core.CoreDescriptor.checkPropertyIsNotEmpty(CoreDescriptor.java:280)
at org.apache.solr.core.CoreDescriptor.(CoreDescriptor.java:191)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:754)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:742)
at 
org.apache.solr.hadoop.SolrRecordWriter.createEmbeddedSolrServer(SolrRecordWriter.java:163)
at 
org.apache.solr.hadoop.SolrRecordWriter.(SolrRecordWriter.java:121) ... 9 
more

Additional points to note:


  *   The solrconfig and schema files are copied as is from Solr 4.
  *   Once collection is deployed, user can perform all operations on the 
collection without any issue.
  *   The indexation process is working fine with the same tool on Solr 4.

Please help.


Thanks,

Manan Sheth








NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.