ICUFoldingFilter with swedish characters, and tokens with the keyword attribute?

2017-01-09 Thread jimi.hullegard
Hi,

I wasn't happy with how our current solr configuration handled diacritics (like 
'é') in the text and in search queries, since it simply considered the letter 
with a diacritic as a distinct letter. Ie 'é' didn't match 'e', and vice versa. 
Except for a handful rare words where the diacritical sign in 'é' actually 
change the word meaning, it is usually used in names of people and places and 
the expected behaivor when searching is to not have to type them and still get 
the expected results (like searching for 'Penelope Cruz' and getting hits for 
'Penélope Cruz').

When reading online about how to handle diacritics in solr, it seems that the 
general recommendation, when no language specific solution exists that handles 
this, is to use the ICUFoldingFilter. However this filter doesn't really come 
with a lot of documentation, and doesn't seem to have any configuration options 
at all (at least not documented).

So what I ended up with doing was simply to add the ICUFoldingFilterFactory in 
the middle of the existing analyzer chain, like this:


 
  
  
  
  
  
  
  
  
 



But that didn't really give me the results I want. For example, using the 
analysis debug tool I see that the text 'café åäö' becomes 'cafe caf aao'. And 
there are two problems with that result:

1. It doesn't respect keyword attribute
2. It folds the Swedish characters 'åäö' into 'aao'

The disregard of the keyword attribute is bad enough, but the mangling of the 
Swedish language is really a show stopper for us. The Swedish language doesn't 
consider 'ö', for example, to be the letter 'o' with two diacritical dots above 
it, just as 'Q' isn't considered to be the letter 'O' with a diacritical 
"squiggly line" at the bottom. So when handling Swedish text, these characters 
('åäöÅÄÖ') shouldn't be folded, because then there will be to many "collisions".

For example, when searching for 'påstå' ('claim'), one doesn't want hits about 
'pasta' (you guessed it, it means 'pasta'), just as one doesn't want to get 
hits about 'aga' ('corporal punishment, usually against children') when 
searching for 'äga' ('to own'). Or even worse, when searching för 'höra' ('to 
hear'), one most likely doesn't want hits about 'hora' ('prostitute'). And I 
can go on... :)

So, is there a way for us to make the ICUFoldingFilter work in a better way? Ie 
configure it to respect the keyword attribute and ignore 'åäö' characters when 
folding, but otherwise fold all diacritical characters into the non-diacritical 
form. Or how would you recommend us to configure our analyzer chain to 
acomplice this?

Regards
/Jimi


Re: Help needed in breaking large index file into smaller ones

2017-01-09 Thread Manan Sheth
Additionally to answer Anshum's queries,

We are currently using Solr 4.10 and planning to upgrade to Solr 6.2.1 and 
upgradation process in creating the current problem.

We are using it in SolrCloud with 8-10 shards split on different nodes each 
having segment size ~30 GB for some collection and ranging 10-12 GB across 
board.

This is due to performance and partial lack of large RAM (currently ~32 
GB/node).

Yes, we want data together in single collection.

Thanks,
Manan Sheth

From: Manan Sheth 
Sent: Tuesday, January 10, 2017 10:51 AM
To: solr-user
Subject: Re: Help needed in breaking large index file into smaller ones

Hi Erick,

Its due to some past issues observed with Joins on Solr 4, which got OOM on 
joining to large indexes after optimization/compaction, if those are stored as 
smaller files those gets fit into memory and operations are performed 
appropriately. Also, there are slow write/commit/updates are observed for large 
files. Thus, to minimize this risk while upgrading on Solr 6, we wanted to 
store indexes into smaller sized files.

Thanks,
Manan Sheth

From: Erick Erickson 
Sent: Tuesday, January 10, 2017 5:24 AM
To: solr-user
Subject: Re: Help needed in breaking large index file into smaller ones

Why do you have a requirement that the indexes be < 4G? If it's
arbitrarily imposed why bother?

Or is it a non-negotiable requirement imposed by the platform you're on?

Because just splitting the files into a smaller set won't help you if
you then start to index into it, the merge process will just recreate
them.

You might be able to do something with the settings in
TieredMergePolicy in the first place to stop generating files > 4g..

Best,
Erick

On Mon, Jan 9, 2017 at 3:27 PM, Anshum Gupta  wrote:
> Can you provide more information about:
> - Are you using Solr in standalone or SolrCloud mode? What version of Solr?
> - Why do you want this? Lack of disk space? Uneven distribution of data on
> shards?
> - Do you want this data together i.e. as part of a single collection?
>
> You can check out the following APIs:
> SPLITSHARD:
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3
> MIGRATE:
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api12
>
> Among other things, make sure you have enough spare disk-space before
> trying out the SPLITSHARD API in particular.
>
> -Anshum
>
>
>
> On Mon, Jan 9, 2017 at 12:08 PM Mikhail Khludnev  wrote:
>
>> Perhaps you can copy this index into a separate location. Remove odd and
>> even docs into former and later indexes consequently, and then force merge
>> to single segment in both locations separately.
>> Perhaps shard splitting in SolrCloud does something like that.
>>
>> On Mon, Jan 9, 2017 at 1:12 PM, Narsimha Reddy CHALLA <
>> chnredd...@gmail.com>
>> wrote:
>>
>> > Hi All,
>> >
>> >   My solr server has a few large index files (say ~10G). I am looking
>> > for some help on breaking them it into smaller ones (each < 4G) to
>> satisfy
>> > my application requirements. Are there any such tools available?
>> >
>> > Appreciate your help.
>> >
>> > Thanks
>> > NRC
>> >
>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>








NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.








NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Re: Help needed in breaking large index file into smaller ones

2017-01-09 Thread Manan Sheth
Hi Erick,

Its due to some past issues observed with Joins on Solr 4, which got OOM on 
joining to large indexes after optimization/compaction, if those are stored as 
smaller files those gets fit into memory and operations are performed 
appropriately. Also, there are slow write/commit/updates are observed for large 
files. Thus, to minimize this risk while upgrading on Solr 6, we wanted to 
store indexes into smaller sized files.

Thanks,
Manan Sheth

From: Erick Erickson 
Sent: Tuesday, January 10, 2017 5:24 AM
To: solr-user
Subject: Re: Help needed in breaking large index file into smaller ones

Why do you have a requirement that the indexes be < 4G? If it's
arbitrarily imposed why bother?

Or is it a non-negotiable requirement imposed by the platform you're on?

Because just splitting the files into a smaller set won't help you if
you then start to index into it, the merge process will just recreate
them.

You might be able to do something with the settings in
TieredMergePolicy in the first place to stop generating files > 4g..

Best,
Erick

On Mon, Jan 9, 2017 at 3:27 PM, Anshum Gupta  wrote:
> Can you provide more information about:
> - Are you using Solr in standalone or SolrCloud mode? What version of Solr?
> - Why do you want this? Lack of disk space? Uneven distribution of data on
> shards?
> - Do you want this data together i.e. as part of a single collection?
>
> You can check out the following APIs:
> SPLITSHARD:
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3
> MIGRATE:
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api12
>
> Among other things, make sure you have enough spare disk-space before
> trying out the SPLITSHARD API in particular.
>
> -Anshum
>
>
>
> On Mon, Jan 9, 2017 at 12:08 PM Mikhail Khludnev  wrote:
>
>> Perhaps you can copy this index into a separate location. Remove odd and
>> even docs into former and later indexes consequently, and then force merge
>> to single segment in both locations separately.
>> Perhaps shard splitting in SolrCloud does something like that.
>>
>> On Mon, Jan 9, 2017 at 1:12 PM, Narsimha Reddy CHALLA <
>> chnredd...@gmail.com>
>> wrote:
>>
>> > Hi All,
>> >
>> >   My solr server has a few large index files (say ~10G). I am looking
>> > for some help on breaking them it into smaller ones (each < 4G) to
>> satisfy
>> > my application requirements. Are there any such tools available?
>> >
>> > Appreciate your help.
>> >
>> > Thanks
>> > NRC
>> >
>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>








NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Re: term frequency solrj

2017-01-09 Thread Shawn Heisey
On 1/9/2017 6:31 AM, huda barakat wrote:
> Can anybody help me, I need to get term frequency for a specific
> filed, I use the techproduct example and I use this code:

The variable "terms" is null on line 29, which is why you are getting
NullPointerException.

> query.setRequestHandler("terms"); 

One possible problem is setting the request handler to "terms" ...
chances are that this should be "/terms" instead.  Handler names in your
config will most likely start with a forward slash, because if they
don't, a typical example config in version 3.6 and later doesn't allow
any way for them to be used.  Since 3.6, "handleSelect" is set to false
in all examples, and it should be left at false.

Thanks,
Shawn



Re: Loading Third party libraries along with Solr

2017-01-09 Thread Shawn Heisey
On 1/9/2017 11:35 AM, Shashank Pedamallu wrote:
> I’m Shashank. I’m new to Solr and was trying to use amazon-aws sdk
> along with Solr. I added amazon-aws.jar and its third party
> dependencies under /solr-6.3.0/server/solr/lib folder. Even after I
> add all required dependencies, I keep getting NoClassDefinitionError
> and NoSuchMethod Errors. I see that some of the third party jars such
> as jackson-core, jackson-mapper-asl libraries are part of
> /solr-6.3.0/server/solr/solr-webapp/WEB-INF/lib, but of different
> versions. The classes in these jars are the ones causing the issue.
> Could someone help me with loading these dependencies (amazon-aws
> third party libs) appropriately to not cause issue with the rest of
> the jars.

The first thing to try would be to simply use the dependencies already
present in Solr.  If the component you are using can't work with the
older version of a third-party library that already exists in Solr, then
you will have to upgrade the third-party libraries in Solr.  This means
replacing those jars in the WEB-INF/lib directory, not adding them to
the user lib directory.  Having multiple versions of any library causes
problems.

Note that if you do upgrade jars in WEB-INF/lib, Solr itself may stop
working correctly.  It's *usually* pretty safe to upgrade an internal
Solr dependency as long as you're not upgrading to a new major version,
but it doesn't always work.

Sometimes it is simply not possible to combine Java projects in the way
you want, because each of them use a dependency in ways that are not
compatible with each other.  Here's an example of something that just
won't work because of problems with a dependency:

https://issues.apache.org/jira/browse/SOLR-5582

Thanks,
Shawn



Re: Failure when trying to full sync, out of space ? Doesn't delete old segments before full sync?

2017-01-09 Thread Shawn Heisey
On 11/28/2016 11:06 AM, Walter Underwood wrote:
> Worst case:
> 1. Disable merging.
> 2. Delete all the documents.
> 3. Add all the documents.
> 4. Enable merging.
>
> After step 3, you have two copies of everything, one deleted copy and one new 
> copy.
> The merge makes a third copy.

Just getting around to replying to this REALLY old thread.

What you've described doesn't really triple the size of the index at the
optimize/forceMerge step.  While it's true that the index is temporarily
three times it's *final* size, it is not three times the pre-optimize
size -- in fact, it would only get 50 percent larger.

Does this mean that the recommendation saying "need 3x the space for
normal operation" is not in fact true?  The Lucene folks seem to be
pretty adamant in their assertion that a merge to one segment can triple
the index size, although I've never seen it actually happen.  Disabling
and re-enabling merging is not what I would call "normal."

Thanks,
Shawn



Re: Solr Index upgradation Merging issue observed

2017-01-09 Thread Shawn Heisey
On 1/8/2017 11:21 PM, Manan Sheth wrote:
> Currently, We are in process of upgrading existing Solr indexes from Solr 4.x 
> to Solr 6.2.1. In order to upgrade existing indexes we are planning to use 
> IndexUpgrader class in sequential manner from Solr 4.x to Solr 5.x and Solr 
> 5.x to Solr 6.2.1.
>
> While preforming the upgrdation a strange behaviour is noticed where all the 
> previous segments are getting merged to one single large segment. We need to 
> preserve the original segments as single large segment is getting bulkier (~ 
> 2-4 TBs).
>
> Please let me know how to tune the process or write custom logic to overcome 
> it.

I've taken a very quick look at the IndexUpgrader code.  It does the
upgrade by calling forceMerge ... which is the Lucene term for what Solr
still calls "optimize."  What you are seeing is completely normal.  This
is how it is designed to work.  Changing it *might* be possible, but it
would involve development work in Lucene.  It likely would not be quick
and easy.

Single-segment indexes are slightly faster than multi-segment indexes
containing the same data, so I am failing to see why this is a problem. 
After the upgrade, the total amount of disk space for your index would
either go down or stay the same, although it will temporarily double in
size during the upgrade.

FYI -- this kind of segment merging can happen as a consequence of
normal indexing, so this is something your system must be prepared to
handle even when you are not using the upgrader tool, on ANY version of
Solr.

Thanks,
Shawn



Re: CDCR logging is Needlessly verbose, fills up the file system fast

2017-01-09 Thread Shawn Heisey
On 12/22/2016 8:10 AM, Webster Homer wrote:
> While testing CDCR I found that it is writing tons of log messages per
> second. Example:
> 2016-12-21 23:24:41.652 INFO  (qtp110456297-13) [c:sial-catalog-material
> s:shard1 r:core_node1 x:sial-catalog-material_shard1_replica1]
> o.a.s.c.S.Request [sial-catalog-material_shard1_replica1]  webapp=/solr
> path=/cdcr params={qt=/cdcr=BOOTSTRAP_STATUS=javabin=2}
> status=0 QTime=0
> 2016-12-21 23:24:41.653 INFO  (qtp110456297-18) [c:sial-catalog-material
> s:shard1 r:core_node1 x:sial-catalog-material_shard1_replica1]
> o.a.s.c.S.Request [sial-catalog-material_shard1_replica1]  webapp=/solr
> path=/cdcr params={qt=/cdcr=BOOTSTRAP_STATUS=javabin=2}
> status=0 QTime=0

I hadn't looked closely at the messages you were seeing in your logs
until now.

These messages are *request* logging.  This is the same code path that
logs every query -- it's not specific to CDCR.  It's just logging all
the requests that Solr is receiving.  If this log message were changed
to DEBUG, then Solr would not log queries by default.  A large number of
Solr users want that logging.

I think that you could probably avoid seeing these logs by configuring
log4j to not log things tagged
asorg.apache.solr.core.SolrCore.Request(even though it's not a real
class, I think log4j can still configure it) ... but then you wouldn't
get your queries logged either.

In order to not log these particular messages, but still log queries and
other requests, the request logging code will need to have a way to
specify that certain messages should not be logged.  This might be
something thatcould beconfigurable at the request handler definition
level -- put something in the requestHandler configuration (for /cdcr in
this case) that tells it to skip logging.  That seems like a good
feature to have.

After looking at the CDCR configuration page in the reference guide, I
might have a little more insight.  You're getting one of these logs
every 1-2 milliseconds ... so it sounds like you have configured the
CDCR with a schedule of one millisecond.  The default value for the
replicator schedule is is ten milliseconds, and the update log
synchronizer defaults to a full minute.  I'm guessing that CDCR is not
designed to have such a low schedule value.  I would personally
configure the replicator schedule even higher than the default --
network latency between Internet locations is often longer than ten
milliseconds.

Thanks,
Shawn



Re: CloudSolrStream can't set the setZkClientTimeout and setZkConnectTimeout properties

2017-01-09 Thread Joel Bernstein
Currently these are not settable.It's easy enough to add a setter for this
values. What types of behaviors have you run into when CloudSolrClient is
having timeouts issues?

Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Jan 9, 2017 at 10:06 AM, Yago Riveiro 
wrote:

> Hi,
>
> Using the CloudSolrStream, is it possible define the setZkConnectTimeout
> and
> setZkClientTimeout of internal CloudSolrClient?
>
> The default negotiation timeout is set to 10 seconds.
>
> Regards,
>
> /Yago
>
>
>
> -
> Best regards
>
> /Yago
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/CloudSolrStream-can-t-set-the-setZkClientTimeout-and-
> setZkConnectTimeout-properties-tp4313127.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Help needed in breaking large solr index file into smaller ones

2017-01-09 Thread Erick Erickson
Why? What do you think this will accomplish? I'm wondering if this is
an XY problem.

Best,
Erick

On Mon, Jan 9, 2017 at 7:48 AM, Manan Sheth  wrote:
> Hi All,
>
> I have a problem simillar to this one, where the indexes in multiple solr 
> shards has created large index files (~10 GB each) and wanted to split this 
> large file on each shard into smaller files.
>
> Please provide some guidelines.
>
> Thanks,
> Manan Sheth
> 
> From: Narsimha Reddy CHALLA 
> Sent: Monday, January 9, 2017 3:51 PM
> To: solr-user@lucene.apache.org
> Subject: Help needed in breaking large solr index file into smaller ones
>
> Hi All,
>
>   My solr server has a few large index files (say ~10G). I am looking
> for some help on breaking them it into smaller ones (each < 4G) to satisfy
> my application requirements. Basically, I am not looking for any
> optimization of index here (ex: optimize, expungeDeletes etc.).
>
> Are there any such tools available?
>
> Appreciate your help.
>
> Thanks
> NRC
>
> 
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential, proprietary, 
> privileged or otherwise protected by law. The message is intended solely for 
> the named addressee. If received in error, please destroy and notify the 
> sender. Any use of this email is prohibited when received in error. Impetus 
> does not represent, warrant and/or guarantee, that the integrity of this 
> communication has been maintained nor that the communication is free of 
> errors, virus, interception or interference.


Re: Help needed in breaking large index file into smaller ones

2017-01-09 Thread Erick Erickson
Why do you have a requirement that the indexes be < 4G? If it's
arbitrarily imposed why bother?

Or is it a non-negotiable requirement imposed by the platform you're on?

Because just splitting the files into a smaller set won't help you if
you then start to index into it, the merge process will just recreate
them.

You might be able to do something with the settings in
TieredMergePolicy in the first place to stop generating files > 4g..

Best,
Erick

On Mon, Jan 9, 2017 at 3:27 PM, Anshum Gupta  wrote:
> Can you provide more information about:
> - Are you using Solr in standalone or SolrCloud mode? What version of Solr?
> - Why do you want this? Lack of disk space? Uneven distribution of data on
> shards?
> - Do you want this data together i.e. as part of a single collection?
>
> You can check out the following APIs:
> SPLITSHARD:
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3
> MIGRATE:
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api12
>
> Among other things, make sure you have enough spare disk-space before
> trying out the SPLITSHARD API in particular.
>
> -Anshum
>
>
>
> On Mon, Jan 9, 2017 at 12:08 PM Mikhail Khludnev  wrote:
>
>> Perhaps you can copy this index into a separate location. Remove odd and
>> even docs into former and later indexes consequently, and then force merge
>> to single segment in both locations separately.
>> Perhaps shard splitting in SolrCloud does something like that.
>>
>> On Mon, Jan 9, 2017 at 1:12 PM, Narsimha Reddy CHALLA <
>> chnredd...@gmail.com>
>> wrote:
>>
>> > Hi All,
>> >
>> >   My solr server has a few large index files (say ~10G). I am looking
>> > for some help on breaking them it into smaller ones (each < 4G) to
>> satisfy
>> > my application requirements. Are there any such tools available?
>> >
>> > Appreciate your help.
>> >
>> > Thanks
>> > NRC
>> >
>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>


Re: Help needed in breaking large index file into smaller ones

2017-01-09 Thread Anshum Gupta
Can you provide more information about:
- Are you using Solr in standalone or SolrCloud mode? What version of Solr?
- Why do you want this? Lack of disk space? Uneven distribution of data on
shards?
- Do you want this data together i.e. as part of a single collection?

You can check out the following APIs:
SPLITSHARD:
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3
MIGRATE:
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api12

Among other things, make sure you have enough spare disk-space before
trying out the SPLITSHARD API in particular.

-Anshum



On Mon, Jan 9, 2017 at 12:08 PM Mikhail Khludnev  wrote:

> Perhaps you can copy this index into a separate location. Remove odd and
> even docs into former and later indexes consequently, and then force merge
> to single segment in both locations separately.
> Perhaps shard splitting in SolrCloud does something like that.
>
> On Mon, Jan 9, 2017 at 1:12 PM, Narsimha Reddy CHALLA <
> chnredd...@gmail.com>
> wrote:
>
> > Hi All,
> >
> >   My solr server has a few large index files (say ~10G). I am looking
> > for some help on breaking them it into smaller ones (each < 4G) to
> satisfy
> > my application requirements. Are there any such tools available?
> >
> > Appreciate your help.
> >
> > Thanks
> > NRC
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: term frequency solrj

2017-01-09 Thread Mikhail Khludnev
Hello Huda,

Try to check this
https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/test/org/apache/solr/client/solrj/response/TermsResponseTest.java


On Mon, Jan 9, 2017 at 4:31 PM, huda barakat 
wrote:

> Hi,
> Can anybody help me, I need to get term frequency for a specific  filed, I
> use the techproduct example and I use this code:
>
> 
> //
> import java.util.List;
> import org.apache.solr.client.solrj.SolrClient;
> import org.apache.solr.client.solrj.SolrQuery;
> import org.apache.solr.client.solrj.impl.HttpSolrClient;
> import org.apache.solr.client.solrj.response.QueryResponse;
> import org.apache.solr.client.solrj.response.TermsResponse;
>
> public class App3 {
> public static void main(String[] args) throws Exception {
>
>   String urlString = "http://localhost:8983/solr/techproducts;;
>   SolrClient solr = new HttpSolrClient.Builder(urlString).build();
>
>   SolrQuery query = new SolrQuery();
>
>   query.setQuery("*:*");
>
>   query.setRequestHandler("terms");
>
>
>   QueryResponse response = solr.query(query);
>   System.out.println("numFound: " +
> response.getResults().getNumFound());
>
>   TermsResponse termResp =response.getTermsResponse();
>
>   List terms = termResp.getTerms("name");
>   System.out.print("size="+ terms.size());
>
>
> }
> }
> 
>
> I get the following error :
>
> Exception in thread "main" numFound: 32
> java.lang.NullPointerException
> at testPkg.App3.main(App3.java:29)
>
>
> Thank you in advance,,,
> Huda
>



-- 
Sincerely yours
Mikhail Khludnev


Re: Help needed in breaking large index file into smaller ones

2017-01-09 Thread Mikhail Khludnev
Perhaps you can copy this index into a separate location. Remove odd and
even docs into former and later indexes consequently, and then force merge
to single segment in both locations separately.
Perhaps shard splitting in SolrCloud does something like that.

On Mon, Jan 9, 2017 at 1:12 PM, Narsimha Reddy CHALLA 
wrote:

> Hi All,
>
>   My solr server has a few large index files (say ~10G). I am looking
> for some help on breaking them it into smaller ones (each < 4G) to satisfy
> my application requirements. Are there any such tools available?
>
> Appreciate your help.
>
> Thanks
> NRC
>



-- 
Sincerely yours
Mikhail Khludnev


Loading Third party libraries along with Solr

2017-01-09 Thread Shashank Pedamallu
Hi,

I’m Shashank. I’m new to Solr and was trying to use amazon-aws sdk along with 
Solr. I added amazon-aws.jar and its third party dependencies under 
/solr-6.3.0/server/solr/lib folder. Even after I add all required dependencies, 
I keep getting NoClassDefinitionError and NoSuchMethod Errors. I see that some 
of the third party jars such as jackson-core, jackson-mapper-asl libraries are 
part of /solr-6.3.0/server/solr/solr-webapp/WEB-INF/lib, but of different 
versions. The classes in these jars are the ones causing the issue. Could 
someone help me with loading these dependencies (amazon-aws third party libs) 
appropriately to not cause issue with the rest of the jars.

Thanks,
Shashank Pedamallu



Soir Ulr entity

2017-01-09 Thread fabigol
Hi, i made a soir project with multiple entity.
I want to launch one entity index with an URL.
How i can choose the entity that i want in my url?

Thank to your help



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Soir-Ulr-entity-tp4313172.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Question about Lucene FieldCache

2017-01-09 Thread Yago Riveiro
Ok, then I need to configure to reduce the size of the cache.

Thanks for the help Mikhail.

--

/Yago Riveiro

On 9 Jan 2017 17:01 +, Mikhail Khludnev , wrote:
> This probably says why
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/core/SolrConfig.java#L258
>
> On Mon, Jan 9, 2017 at 4:41 PM, Yago Riveiro  wrote:
>
> > The documentation says that the only caches configurable are:
> >
> > - filterCache
> > - queryResultCache
> > - documentCache
> > - user defined caches
> >
> > There is no entry for fieldValueCache and in my case all of list in the
> > documentation are disable ...
> >
> > --
> >
> > /Yago Riveiro
> >
> > On 9 Jan 2017 13:20 +, Mikhail Khludnev , wrote:
> > > On Mon, Jan 9, 2017 at 2:17 PM, Yago Riveiro  > wrote:
> > >
> > > > Thanks for re reply Mikhail,
> > > >
> > > > Do you know if the 1 value is configurable?
> > >
> > > yes. in solrconfig.xml
> > > https://cwiki.apache.org/confluence/display/solr/Query+
> > Settings+in+SolrConfig#QuerySettingsinSolrConfig-Caches
> > > iirc you cant' fully disable it setting size to 0.
> > >
> > >
> > > > My insert rate is so high
> > > > (5000 docs/s) that the cache it's quite useless.
> > > >
> > > > In the case of the Lucene field cache, it's possible "clean" it in some
> > > > way?
> > > >
> > > > Even it would be possible, the first sorting query or so loads it back.
> > >
> > > > Some cache is eating my memory heap.
> > > >
> > > Probably you need to dedicate master which won't load FieldCache.
> > >
> > >
> > > >
> > > >
> > > >
> > > > -
> > > > Best regards
> > > >
> > > > /Yago
> > > > --
> > > > View this message in context: http://lucene.472066.n3.
> > > > nabble.com/Question-about-Lucene-FieldCache-tp4313062p4313069.html
> > > > Sent from the Solr - User mailing list archive at Nabble.com.
> > > >
> > >
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev


Re: Question about Lucene FieldCache

2017-01-09 Thread Mikhail Khludnev
This probably says why
https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/core/SolrConfig.java#L258

On Mon, Jan 9, 2017 at 4:41 PM, Yago Riveiro  wrote:

> The documentation says that the only caches configurable are:
>
> - filterCache
> - queryResultCache
> - documentCache
> - user defined caches
>
> There is no entry for fieldValueCache and in my case all of list in the
> documentation are disable ...
>
> --
>
> /Yago Riveiro
>
> On 9 Jan 2017 13:20 +, Mikhail Khludnev , wrote:
> > On Mon, Jan 9, 2017 at 2:17 PM, Yago Riveiro 
> wrote:
> >
> > > Thanks for re reply Mikhail,
> > >
> > > Do you know if the 1 value is configurable?
> >
> > yes. in solrconfig.xml
> > https://cwiki.apache.org/confluence/display/solr/Query+
> Settings+in+SolrConfig#QuerySettingsinSolrConfig-Caches
> > iirc you cant' fully disable it setting size to 0.
> >
> >
> > > My insert rate is so high
> > > (5000 docs/s) that the cache it's quite useless.
> > >
> > > In the case of the Lucene field cache, it's possible "clean" it in some
> > > way?
> > >
> > > Even it would be possible, the first sorting query or so loads it back.
> >
> > > Some cache is eating my memory heap.
> > >
> > Probably you need to dedicate master which won't load FieldCache.
> >
> >
> > >
> > >
> > >
> > > -
> > > Best regards
> > >
> > > /Yago
> > > --
> > > View this message in context: http://lucene.472066.n3.
> > > nabble.com/Question-about-Lucene-FieldCache-tp4313062p4313069.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
>



-- 
Sincerely yours
Mikhail Khludnev


RE: Help needed in breaking large index file into smaller ones

2017-01-09 Thread Moenieb Davids
Hi,

Aplogies for my response, did not read the question properly. 
I was speaking about splitting files for import

-Original Message-
From: billnb...@gmail.com [mailto:billnb...@gmail.com] 
Sent: 09 January 2017 05:45 PM
To: solr-user@lucene.apache.org
Subject: Re: Help needed in breaking large index file into smaller ones

Can you set Solr config segments to a higher number, don't optimize and you 
will get smaller files after a new index is created.

Can you reindex ?

Bill Bell
Sent from mobile


> On Jan 9, 2017, at 7:15 AM, Narsimha Reddy CHALLA  
> wrote:
> 
> No, it does not work by splitting. First of all lucene index files are 
> not text files. There is a segment_NN file which will refer index 
> files in a commit. So, when we split a large index file into smaller 
> ones, the corresponding segment_NN file also needs to be updated with 
> new index files OR a new segment_NN file should be created, probably.
> 
> Can someone who is familiar with lucene index files please help us in 
> this regard?
> 
> Thanks
> NRC
> 
> On Mon, Jan 9, 2017 at 7:38 PM, Manan Sheth 
> 
> wrote:
> 
>> Is this really works for lucene index files?
>> 
>> Thanks,
>> Manan Sheth
>> 
>> From: Moenieb Davids 
>> Sent: Monday, January 9, 2017 7:36 PM
>> To: solr-user@lucene.apache.org
>> Subject: RE: Help needed in breaking large index file into smaller 
>> ones
>> 
>> Hi,
>> 
>> Try split on linux or unix
>> 
>> split -l 100 originalfile.csv
>> this will split a file into 100 lines each
>> 
>> see other options for how to split like size
>> 
>> 
>> -Original Message-
>> From: Narsimha Reddy CHALLA [mailto:chnredd...@gmail.com]
>> Sent: 09 January 2017 12:12 PM
>> To: solr-user@lucene.apache.org
>> Subject: Help needed in breaking large index file into smaller ones
>> 
>> Hi All,
>> 
>>  My solr server has a few large index files (say ~10G). I am 
>> looking for some help on breaking them it into smaller ones (each < 
>> 4G) to satisfy my application requirements. Are there any such tools 
>> available?
>> 
>> Appreciate your help.
>> 
>> Thanks
>> NRC
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> ===
>> GPAA e-mail Disclaimers and confidential note
>> 
>> This e-mail is intended for the exclusive use of the addressee only.
>> If you are not the intended recipient, you should not use the 
>> contents or disclose them to any other person. Please notify the 
>> sender immediately and delete the e-mail. This e-mail is not intended 
>> nor shall it be taken to create any legal relations, contractual or 
>> otherwise.
>> Legally binding obligations can only arise for the GPAA by means of a 
>> written instrument signed by an authorised signatory.
>> 
>> ===
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> NOTE: This message may contain information that is confidential, 
>> proprietary, privileged or otherwise protected by law. The message is 
>> intended solely for the named addressee. If received in error, please 
>> destroy and notify the sender. Any use of this email is prohibited 
>> when received in error. Impetus does not represent, warrant and/or 
>> guarantee, that the integrity of this communication has been 
>> maintained nor that the communication is free of errors, virus, interception 
>> or interference.
>> 










===
GPAA e-mail Disclaimers and confidential note 

This e-mail is intended for the exclusive use of the addressee only.
If you are not the intended recipient, you should not use the contents 
or disclose them to any other person. Please notify the sender immediately 
and delete the e-mail. This e-mail is not intended nor 
shall it be taken to create any legal relations, contractual or otherwise. 
Legally binding obligations can only arise for the GPAA by means of 
a written instrument signed by an authorised signatory.
===



can we customize SOLR search for IBM Filenet 5.2?

2017-01-09 Thread puneetmishra2555
can we customize SOLR search for IBM Filenet 5.2?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/can-we-customize-SOLR-search-for-IBM-Filenet-5-2-tp4313091.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud and LVM

2017-01-09 Thread billnbell
Yeah we normally take the number of GB on a machine for the index size on disk 
and then double it for memory...

For example we have 28gb on disk and we see great perf at 64gb ram.

If you can do that you will probably get good results. Remember to not give 
Java much memory. We set it at 12gb. We call it starving Java and it reduces 
the time to garbage collect to small increments.


Bill Bell
Sent from mobile


> On Jan 9, 2017, at 7:56 AM, Chris Ulicny  wrote:
> 
> That's good to hear. I didn't think there would be any reason that using
> lvm would impact solr's performance but wanted to see if there was anything
> I've missed.
> 
> As far as other performance goes, we use pcie and sata solid state drives
> since the indexes are mostly too large to cache entirely in memory, and we
> haven't had any performance problems so far. So I'm not expecting that to
> change too much when moving the cloud architecture.
> 
> Thanks again.
> 
> 
>> On Thu, Jan 5, 2017 at 7:55 PM Shawn Heisey  wrote:
>> 
>>> On 1/5/2017 3:12 PM, Chris Ulicny wrote:
>>> Is there any known significant performance impact of running solrcloud
>> with
>>> lvm on linux?
>>> 
>>> While migrating to solrcloud we don't have the storage capacity for our
>>> expected final size, so we are planning on setting up the solrcloud
>>> instances on a logical volume that we can grow when hardware becomes
>>> available.
>> 
>> Nothing specific.  Whatever the general performance impacts for LVM are
>> is what Solr would encounter when it reads and writes data to/from the
>> disk.
>> 
>> If your system has enough memory for good performance, then disk reads
>> will be rare, so the performance of the storage volume wouldn't matter
>> much.  If you don't have enough memory, then the disk performance would
>> matter ...although Solr's performance at that point would probably be
>> bad enough that you'd be looking for ways to improve it.
>> 
>> Here's some information:
>> 
>> https://wiki.apache.org/solr/SolrPerformanceProblems
>> 
>> Exactly how much memory is enough depends on enough factors that there's
>> no good general advice.  The only thing we can say in general is to
>> recommend the ideal setup -- where you have enough spare memory that
>> your OS can cache the ENTIRE index.  The ideal setup is usually not
>> required for good performance.
>> 
>> Thanks,
>> Shawn
>> 
>> 


Facet date Range without start and and date

2017-01-09 Thread nabil Kouici
Hi All,
Is it possible to have facet date range without specifying start and and of the 
range.
Otherwise, is it possible to put in the same request start to min value and end 
to max value.
Thank you.
Regards,NKI.


Available

2017-01-09 Thread billnbell
I am available for consulting projects if your project needs help.

Been doing Solr work for 6 years...

Bill Bell
Sent from mobile



Re: Question about Lucene FieldCache

2017-01-09 Thread billnbell
Try disabling and perf may get better 

Bill Bell
Sent from mobile


> On Jan 9, 2017, at 6:41 AM, Yago Riveiro  wrote:
> 
> The documentation says that the only caches configurable are:
> 
> - filterCache
> - queryResultCache
> - documentCache
> - user defined caches
> 
> There is no entry for fieldValueCache and in my case all of list in the 
> documentation are disable ...
> 
> --
> 
> /Yago Riveiro
> 
>> On 9 Jan 2017 13:20 +, Mikhail Khludnev , wrote:
>>> On Mon, Jan 9, 2017 at 2:17 PM, Yago Riveiro  wrote:
>>> 
>>> Thanks for re reply Mikhail,
>>> 
>>> Do you know if the 1 value is configurable?
>> 
>> yes. in solrconfig.xml
>> https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig#QuerySettingsinSolrConfig-Caches
>> iirc you cant' fully disable it setting size to 0.
>> 
>> 
>>> My insert rate is so high
>>> (5000 docs/s) that the cache it's quite useless.
>>> 
>>> In the case of the Lucene field cache, it's possible "clean" it in some
>>> way?
>>> 
>>> Even it would be possible, the first sorting query or so loads it back.
>> 
>>> Some cache is eating my memory heap.
>>> 
>> Probably you need to dedicate master which won't load FieldCache.
>> 
>> 
>>> 
>>> 
>>> 
>>> -
>>> Best regards
>>> 
>>> /Yago
>>> --
>>> View this message in context: http://lucene.472066.n3.
>>> nabble.com/Question-about-Lucene-FieldCache-tp4313062p4313069.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>> 
>> 
>> 
>> 
>> --
>> Sincerely yours
>> Mikhail Khludnev


Re: Help needed in breaking large index file into smaller ones

2017-01-09 Thread billnbell
Can you set Solr config segments to a higher number, don't optimize and you 
will get smaller files after a new index is created.

Can you reindex ?

Bill Bell
Sent from mobile


> On Jan 9, 2017, at 7:15 AM, Narsimha Reddy CHALLA  
> wrote:
> 
> No, it does not work by splitting. First of all lucene index files are not
> text files. There is a segment_NN file which will refer index files in a
> commit. So, when we split a large index file into smaller ones, the
> corresponding segment_NN file also needs to be updated with new index files
> OR a new segment_NN file should be created, probably.
> 
> Can someone who is familiar with lucene index files please help us in this
> regard?
> 
> Thanks
> NRC
> 
> On Mon, Jan 9, 2017 at 7:38 PM, Manan Sheth 
> wrote:
> 
>> Is this really works for lucene index files?
>> 
>> Thanks,
>> Manan Sheth
>> 
>> From: Moenieb Davids 
>> Sent: Monday, January 9, 2017 7:36 PM
>> To: solr-user@lucene.apache.org
>> Subject: RE: Help needed in breaking large index file into smaller ones
>> 
>> Hi,
>> 
>> Try split on linux or unix
>> 
>> split -l 100 originalfile.csv
>> this will split a file into 100 lines each
>> 
>> see other options for how to split like size
>> 
>> 
>> -Original Message-
>> From: Narsimha Reddy CHALLA [mailto:chnredd...@gmail.com]
>> Sent: 09 January 2017 12:12 PM
>> To: solr-user@lucene.apache.org
>> Subject: Help needed in breaking large index file into smaller ones
>> 
>> Hi All,
>> 
>>  My solr server has a few large index files (say ~10G). I am looking
>> for some help on breaking them it into smaller ones (each < 4G) to satisfy
>> my application requirements. Are there any such tools available?
>> 
>> Appreciate your help.
>> 
>> Thanks
>> NRC
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> ===
>> GPAA e-mail Disclaimers and confidential note
>> 
>> This e-mail is intended for the exclusive use of the addressee only.
>> If you are not the intended recipient, you should not use the contents
>> or disclose them to any other person. Please notify the sender immediately
>> and delete the e-mail. This e-mail is not intended nor
>> shall it be taken to create any legal relations, contractual or otherwise.
>> Legally binding obligations can only arise for the GPAA by means of
>> a written instrument signed by an authorised signatory.
>> 
>> ===
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> NOTE: This message may contain information that is confidential,
>> proprietary, privileged or otherwise protected by law. The message is
>> intended solely for the named addressee. If received in error, please
>> destroy and notify the sender. Any use of this email is prohibited when
>> received in error. Impetus does not represent, warrant and/or guarantee,
>> that the integrity of this communication has been maintained nor that the
>> communication is free of errors, virus, interception or interference.
>> 


Re: CDCR How to recover from Corrupted transaction log

2017-01-09 Thread Webster Homer
The root cause was the aggressive logging filling up the file system. Our
admins have the logs on the same file system with the data, so when the
filesystem got full it couldn't write to the transaction logs which
corrupted them

Thank you for the tips on recovery, I will forward them to our admins

Web

On Fri, Jan 6, 2017 at 12:43 PM, Shawn Heisey  wrote:

> On 1/6/2017 10:09 AM, Webster Homer wrote:
> > This happened while testing and was not in a production system. So we
> > just deleted both collections and recreated them after fixing the root
> > cause. If this had been a production system that would not have been
> > acceptable. What is the best way to recover from a problem like this?
> > Stop cdcr and delete the tlog files?
>
> What was the root cause?  Need to know that before anyone can tell you
> whether or not you've run into a bug.
>
> If it was the problem you've separately described where CDCR logging
> filled up your disk ... handling that gracefully in a program is very
> difficult.  It's possible, but there's very little incentive for anyone
> to attempt it.  Lucene and Solr have a general requirement of plenty of
> free disk space (enough for the index to triple in size temporarily)
> just for normal operation, so coding for disk space exhaustion isn't
> likely to happen.  Server monitoring should send an alarm when disk
> space gets low so you can fix it before it causes real problems.
>
> Thanks,
> Shawn
>
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


CloudSolrStream can't set the setZkClientTimeout and setZkConnectTimeout properties

2017-01-09 Thread Yago Riveiro
Hi,

Using the CloudSolrStream, is it possible define the setZkConnectTimeout and
setZkClientTimeout of internal CloudSolrClient?

The default negotiation timeout is set to 10 seconds.

Regards,

/Yago



-
Best regards

/Yago
--
View this message in context: 
http://lucene.472066.n3.nabble.com/CloudSolrStream-can-t-set-the-setZkClientTimeout-and-setZkConnectTimeout-properties-tp4313127.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud and LVM

2017-01-09 Thread Chris Ulicny
That's good to hear. I didn't think there would be any reason that using
lvm would impact solr's performance but wanted to see if there was anything
I've missed.

As far as other performance goes, we use pcie and sata solid state drives
since the indexes are mostly too large to cache entirely in memory, and we
haven't had any performance problems so far. So I'm not expecting that to
change too much when moving the cloud architecture.

Thanks again.


On Thu, Jan 5, 2017 at 7:55 PM Shawn Heisey  wrote:

> On 1/5/2017 3:12 PM, Chris Ulicny wrote:
> > Is there any known significant performance impact of running solrcloud
> with
> > lvm on linux?
> >
> > While migrating to solrcloud we don't have the storage capacity for our
> > expected final size, so we are planning on setting up the solrcloud
> > instances on a logical volume that we can grow when hardware becomes
> > available.
>
> Nothing specific.  Whatever the general performance impacts for LVM are
> is what Solr would encounter when it reads and writes data to/from the
> disk.
>
> If your system has enough memory for good performance, then disk reads
> will be rare, so the performance of the storage volume wouldn't matter
> much.  If you don't have enough memory, then the disk performance would
> matter ...although Solr's performance at that point would probably be
> bad enough that you'd be looking for ways to improve it.
>
> Here's some information:
>
> https://wiki.apache.org/solr/SolrPerformanceProblems
>
> Exactly how much memory is enough depends on enough factors that there's
> no good general advice.  The only thing we can say in general is to
> recommend the ideal setup -- where you have enough spare memory that
> your OS can cache the ENTIRE index.  The ideal setup is usually not
> required for good performance.
>
> Thanks,
> Shawn
>
>


Re: Help needed in breaking large index file into smaller ones

2017-01-09 Thread Yago Riveiro
You can try to reindex your data to another collection with more shards

--

/Yago Riveiro

On 9 Jan 2017 14:15 +, Narsimha Reddy CHALLA , wrote:
> No, it does not work by splitting. First of all lucene index files are not
> text files. There is a segment_NN file which will refer index files in a
> commit. So, when we split a large index file into smaller ones, the
> corresponding segment_NN file also needs to be updated with new index files
> OR a new segment_NN file should be created, probably.
>
> Can someone who is familiar with lucene index files please help us in this
> regard?
>
> Thanks
> NRC
>
> On Mon, Jan 9, 2017 at 7:38 PM, Manan Sheth  wrote:
>
> > Is this really works for lucene index files?
> >
> > Thanks,
> > Manan Sheth
> > 
> > From: Moenieb Davids  > Sent: Monday, January 9, 2017 7:36 PM
> > To: solr-user@lucene.apache.org
> > Subject: RE: Help needed in breaking large index file into smaller ones
> >
> > Hi,
> >
> > Try split on linux or unix
> >
> > split -l 100 originalfile.csv
> > this will split a file into 100 lines each
> >
> > see other options for how to split like size
> >
> >
> > -Original Message-
> > From: Narsimha Reddy CHALLA [mailto:chnredd...@gmail.com]
> > Sent: 09 January 2017 12:12 PM
> > To: solr-user@lucene.apache.org
> > Subject: Help needed in breaking large index file into smaller ones
> >
> > Hi All,
> >
> > My solr server has a few large index files (say ~10G). I am looking
> > for some help on breaking them it into smaller ones (each < 4G) to satisfy
> > my application requirements. Are there any such tools available?
> >
> > Appreciate your help.
> >
> > Thanks
> > NRC
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > 
> > ===
> > GPAA e-mail Disclaimers and confidential note
> >
> > This e-mail is intended for the exclusive use of the addressee only.
> > If you are not the intended recipient, you should not use the contents
> > or disclose them to any other person. Please notify the sender immediately
> > and delete the e-mail. This e-mail is not intended nor
> > shall it be taken to create any legal relations, contractual or otherwise.
> > Legally binding obligations can only arise for the GPAA by means of
> > a written instrument signed by an authorised signatory.
> > 
> > ===
> >
> > 
> >
> >
> >
> >
> >
> >
> > NOTE: This message may contain information that is confidential,
> > proprietary, privileged or otherwise protected by law. The message is
> > intended solely for the named addressee. If received in error, please
> > destroy and notify the sender. Any use of this email is prohibited when
> > received in error. Impetus does not represent, warrant and/or guarantee,
> > that the integrity of this communication has been maintained nor that the
> > communication is free of errors, virus, interception or interference.
> >


Re: Help needed in breaking large index file into smaller ones

2017-01-09 Thread Narsimha Reddy CHALLA
No, it does not work by splitting. First of all lucene index files are not
text files. There is a segment_NN file which will refer index files in a
commit. So, when we split a large index file into smaller ones, the
corresponding segment_NN file also needs to be updated with new index files
OR a new segment_NN file should be created, probably.

Can someone who is familiar with lucene index files please help us in this
regard?

Thanks
NRC

On Mon, Jan 9, 2017 at 7:38 PM, Manan Sheth 
wrote:

> Is this really works for lucene index files?
>
> Thanks,
> Manan Sheth
> 
> From: Moenieb Davids 
> Sent: Monday, January 9, 2017 7:36 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Help needed in breaking large index file into smaller ones
>
> Hi,
>
> Try split on linux or unix
>
> split -l 100 originalfile.csv
> this will split a file into 100 lines each
>
> see other options for how to split like size
>
>
> -Original Message-
> From: Narsimha Reddy CHALLA [mailto:chnredd...@gmail.com]
> Sent: 09 January 2017 12:12 PM
> To: solr-user@lucene.apache.org
> Subject: Help needed in breaking large index file into smaller ones
>
> Hi All,
>
>   My solr server has a few large index files (say ~10G). I am looking
> for some help on breaking them it into smaller ones (each < 4G) to satisfy
> my application requirements. Are there any such tools available?
>
> Appreciate your help.
>
> Thanks
> NRC
>
>
>
>
>
>
>
>
>
>
> 
> ===
> GPAA e-mail Disclaimers and confidential note
>
> This e-mail is intended for the exclusive use of the addressee only.
> If you are not the intended recipient, you should not use the contents
> or disclose them to any other person. Please notify the sender immediately
> and delete the e-mail. This e-mail is not intended nor
> shall it be taken to create any legal relations, contractual or otherwise.
> Legally binding obligations can only arise for the GPAA by means of
> a written instrument signed by an authorised signatory.
> 
> ===
>
> 
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>


Re: Help needed in breaking large index file into smaller ones

2017-01-09 Thread Manan Sheth
Is this really works for lucene index files?

Thanks,
Manan Sheth

From: Moenieb Davids 
Sent: Monday, January 9, 2017 7:36 PM
To: solr-user@lucene.apache.org
Subject: RE: Help needed in breaking large index file into smaller ones

Hi,

Try split on linux or unix

split -l 100 originalfile.csv
this will split a file into 100 lines each

see other options for how to split like size


-Original Message-
From: Narsimha Reddy CHALLA [mailto:chnredd...@gmail.com]
Sent: 09 January 2017 12:12 PM
To: solr-user@lucene.apache.org
Subject: Help needed in breaking large index file into smaller ones

Hi All,

  My solr server has a few large index files (say ~10G). I am looking for 
some help on breaking them it into smaller ones (each < 4G) to satisfy my 
application requirements. Are there any such tools available?

Appreciate your help.

Thanks
NRC










===
GPAA e-mail Disclaimers and confidential note

This e-mail is intended for the exclusive use of the addressee only.
If you are not the intended recipient, you should not use the contents
or disclose them to any other person. Please notify the sender immediately
and delete the e-mail. This e-mail is not intended nor
shall it be taken to create any legal relations, contractual or otherwise.
Legally binding obligations can only arise for the GPAA by means of
a written instrument signed by an authorised signatory.
===








NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


IndexWriter.forceMerge not working as desired

2017-01-09 Thread Manan Sheth
Hi All,


While doing index merging through IndexWriter.forceMerge method in solr 6.2.1, 
I am passing the argument as 30, but it is still merging all the data (earlier 
collection use to have 10 segments) into single segment. Please provide some 
information in understading the behaviour.


Thanks,

Manan Sheth








NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


RE: Help needed in breaking large index file into smaller ones

2017-01-09 Thread Moenieb Davids
Hi,

Try split on linux or unix

split -l 100 originalfile.csv
this will split a file into 100 lines each

see other options for how to split like size


-Original Message-
From: Narsimha Reddy CHALLA [mailto:chnredd...@gmail.com] 
Sent: 09 January 2017 12:12 PM
To: solr-user@lucene.apache.org
Subject: Help needed in breaking large index file into smaller ones

Hi All,

  My solr server has a few large index files (say ~10G). I am looking for 
some help on breaking them it into smaller ones (each < 4G) to satisfy my 
application requirements. Are there any such tools available?

Appreciate your help.

Thanks
NRC










===
GPAA e-mail Disclaimers and confidential note 

This e-mail is intended for the exclusive use of the addressee only.
If you are not the intended recipient, you should not use the contents 
or disclose them to any other person. Please notify the sender immediately 
and delete the e-mail. This e-mail is not intended nor 
shall it be taken to create any legal relations, contractual or otherwise. 
Legally binding obligations can only arise for the GPAA by means of 
a written instrument signed by an authorised signatory.
===


RE: How to integrate SOLR in ibm filenet 5.2.1?

2017-01-09 Thread Markus Jelsma
Apache ManifolCF is probably your friend here:
http://manifoldcf.apache.org/en_US/index.html
 
 
-Original message-
> From:puneetmishra2555 
> Sent: Monday 9th January 2017 14:37
> To: solr-user@lucene.apache.org
> Subject: How to integrate SOLR in ibm filenet 5.2.1?
> 
> How we can integrate SOLR in IBM filenet 5.2?
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-integrate-SOLR-in-ibm-filenet-5-2-1-tp4313090.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 


Re: Question about Lucene FieldCache

2017-01-09 Thread Yago Riveiro
The documentation says that the only caches configurable are:

- filterCache
- queryResultCache
- documentCache
- user defined caches

There is no entry for fieldValueCache and in my case all of list in the 
documentation are disable ...

--

/Yago Riveiro

On 9 Jan 2017 13:20 +, Mikhail Khludnev , wrote:
> On Mon, Jan 9, 2017 at 2:17 PM, Yago Riveiro  wrote:
>
> > Thanks for re reply Mikhail,
> >
> > Do you know if the 1 value is configurable?
>
> yes. in solrconfig.xml
> https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig#QuerySettingsinSolrConfig-Caches
> iirc you cant' fully disable it setting size to 0.
>
>
> > My insert rate is so high
> > (5000 docs/s) that the cache it's quite useless.
> >
> > In the case of the Lucene field cache, it's possible "clean" it in some
> > way?
> >
> > Even it would be possible, the first sorting query or so loads it back.
>
> > Some cache is eating my memory heap.
> >
> Probably you need to dedicate master which won't load FieldCache.
>
>
> >
> >
> >
> > -
> > Best regards
> >
> > /Yago
> > --
> > View this message in context: http://lucene.472066.n3.
> > nabble.com/Question-about-Lucene-FieldCache-tp4313062p4313069.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev


How to integrate SOLR in ibm filenet 5.2.1?

2017-01-09 Thread puneetmishra2555
How we can integrate SOLR in IBM filenet 5.2?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-integrate-SOLR-in-ibm-filenet-5-2-1-tp4313090.html
Sent from the Solr - User mailing list archive at Nabble.com.


term frequency solrj

2017-01-09 Thread huda barakat
Hi,
Can anybody help me, I need to get term frequency for a specific  filed, I
use the techproduct example and I use this code:

//
import java.util.List;
import org.apache.solr.client.solrj.SolrClient;
import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.impl.HttpSolrClient;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.client.solrj.response.TermsResponse;

public class App3 {
public static void main(String[] args) throws Exception {

  String urlString = "http://localhost:8983/solr/techproducts;;
  SolrClient solr = new HttpSolrClient.Builder(urlString).build();

  SolrQuery query = new SolrQuery();

  query.setQuery("*:*");

  query.setRequestHandler("terms");


  QueryResponse response = solr.query(query);
  System.out.println("numFound: " +
response.getResults().getNumFound());

  TermsResponse termResp =response.getTermsResponse();

  List terms = termResp.getTerms("name");
  System.out.print("size="+ terms.size());


}
}


I get the following error :

Exception in thread "main" numFound: 32
java.lang.NullPointerException
at testPkg.App3.main(App3.java:29)


Thank you in advance,,,
Huda


Re: Question about Lucene FieldCache

2017-01-09 Thread Mikhail Khludnev
On Mon, Jan 9, 2017 at 2:17 PM, Yago Riveiro  wrote:

> Thanks for re reply Mikhail,
>
> Do you know if the 1 value is configurable?

yes. in solrconfig.xml
https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig#QuerySettingsinSolrConfig-Caches
iirc you cant' fully disable it setting size to 0.


> My insert rate is so high
> (5000 docs/s) that the cache it's quite useless.
>
> In the case of the Lucene field cache, it's possible "clean" it in some
> way?
>
> Even it would be possible, the first sorting query or so loads it back.

> Some cache is eating my memory heap.
>
Probably you need to dedicate master which won't load FieldCache.


>
>
>
> -
> Best regards
>
> /Yago
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Question-about-Lucene-FieldCache-tp4313062p4313069.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev


Re: Help needed in breaking large solr index file into smaller ones

2017-01-09 Thread Manan Sheth
Hi All,

I have a problem simillar to this one, where the indexes in multiple solr 
shards has created large index files (~10 GB each) and wanted to split this 
large file on each shard into smaller files.

Please provide some guidelines.

Thanks,
Manan Sheth

From: Narsimha Reddy CHALLA 
Sent: Monday, January 9, 2017 3:51 PM
To: solr-user@lucene.apache.org
Subject: Help needed in breaking large solr index file into smaller ones

Hi All,

  My solr server has a few large index files (say ~10G). I am looking
for some help on breaking them it into smaller ones (each < 4G) to satisfy
my application requirements. Basically, I am not looking for any
optimization of index here (ex: optimize, expungeDeletes etc.).

Are there any such tools available?

Appreciate your help.

Thanks
NRC








NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Re: Question about Lucene FieldCache

2017-01-09 Thread Yago Riveiro
Thanks for re reply Mikhail,

Do you know if the 1 value is configurable? My insert rate is so high
(5000 docs/s) that the cache it's quite useless.

In the case of the Lucene field cache, it's possible "clean" it in some way?

Some cache is eating my memory heap.



-
Best regards

/Yago
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-about-Lucene-FieldCache-tp4313062p4313069.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Question about Lucene FieldCache

2017-01-09 Thread Mikhail Khludnev
Hello, Yago.

"size": "1", "showItems": "-1", "initialSize": "10", "name":
"fieldValueCache"

These are Solr's UnInvertedFields, not Lucene's FieldCache.
That 1 is for all fields of the collection schema.
Collection reload or commit drop all entries from this cache.


On Mon, Jan 9, 2017 at 1:30 PM, Yago Riveiro  wrote:

> Hi,
>
> After some reading into the documentation, supposedly the Lucene FieldCache
> is the only one that it's not possible to disable.
>
> Fetching the config for a collection through the REST API I found an entry
> like this:
>
> "query": {
> "useFilterForSortedQuery": true,
> "queryResultWindowSize": 1,
> "queryResultMaxDocsCached": 0,
> "enableLazyFieldLoading": true,
> "maxBooleanClauses": 8192,
> "": {
> "size": "1",
> "showItems": "-1",
> "initialSize": "10",
> "name": "fieldValueCache"
> }
> },
>
> My questions:
>
> - That size, 1 is for all files of the collection schema or is 1
> for
> each field defined?
> - If I reload the collection the caches are wiped?
>
> Regards,
>
> /Yago
>
>
>
> -
> Best regards
>
> /Yago
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Question-about-Lucene-FieldCache-tp4313062.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev


Question about Lucene FieldCache

2017-01-09 Thread Yago Riveiro
Hi,

After some reading into the documentation, supposedly the Lucene FieldCache
is the only one that it's not possible to disable.

Fetching the config for a collection through the REST API I found an entry
like this:

"query": {
"useFilterForSortedQuery": true,
"queryResultWindowSize": 1,
"queryResultMaxDocsCached": 0,
"enableLazyFieldLoading": true,
"maxBooleanClauses": 8192,
"": {
"size": "1",
"showItems": "-1",
"initialSize": "10",
"name": "fieldValueCache"
}
},

My questions:

- That size, 1 is for all files of the collection schema or is 1 for
each field defined?
- If I reload the collection the caches are wiped?

Regards,

/Yago



-
Best regards

/Yago
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-about-Lucene-FieldCache-tp4313062.html
Sent from the Solr - User mailing list archive at Nabble.com.


Help needed in breaking large solr index file into smaller ones

2017-01-09 Thread Narsimha Reddy CHALLA
Hi All,

  My solr server has a few large index files (say ~10G). I am looking
for some help on breaking them it into smaller ones (each < 4G) to satisfy
my application requirements. Basically, I am not looking for any
optimization of index here (ex: optimize, expungeDeletes etc.).

Are there any such tools available?

Appreciate your help.

Thanks
NRC


Help needed in breaking large index file into smaller ones

2017-01-09 Thread Narsimha Reddy CHALLA
Hi All,

  My solr server has a few large index files (say ~10G). I am looking
for some help on breaking them it into smaller ones (each < 4G) to satisfy
my application requirements. Are there any such tools available?

Appreciate your help.

Thanks
NRC


OnError CSV upload

2017-01-09 Thread Moenieb Davids
Hi All,

Background:
I have a mainframe file that I want to upload and the data is pipe delimited.
Some of the records however have a few fields less that others within the same 
file and when I try to import the file, Solr has an issue with the amount of 
columns vs the amount of values, which is correct.

Is there not a way, using the standard CSV upload, to continue on error and 
perhaps get a log of the failed records?












===
GPAA e-mail Disclaimers and confidential note 

This e-mail is intended for the exclusive use of the addressee only.
If you are not the intended recipient, you should not use the contents 
or disclose them to any other person. Please notify the sender immediately 
and delete the e-mail. This e-mail is not intended nor 
shall it be taken to create any legal relations, contractual or otherwise. 
Legally binding obligations can only arise for the GPAA by means of 
a written instrument signed by an authorised signatory.
===


Re: Regarding /sql -- WHERE <> IS NULL and IS NOT NULL

2017-01-09 Thread Gethin James
For NOT NULL, I had some success using:


WHERE field_name <> '' (greater or less than empty quotes)


Best regards,

Gethin.


From: Joel Bernstein 
Sent: 05 January 2017 20:12:19
To: solr-user@lucene.apache.org
Subject: Re: Regarding /sql -- WHERE <> IS NULL and IS NOT NULL

IS NULL and IS NOT NULL predicate are not currently supported.

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Jan 5, 2017 at 2:05 PM, radha krishnan 
wrote:

> Hi,
>
> solr version : 6.3
>
> will WHERE <> IS NULL / IS NOT NULL work with the /sql handler
> ?
>
> " select   name from gettingstarted where name is not null "
>
> the above query is not returning any documents in the response even if
> there are documents with "name"defined
>
>
> Thanks,
> Radhakrishnan D
>


Re: SolrCloud different score for same document on different replicas.

2017-01-09 Thread Morten Bøgeskov
On Fri, 6 Jan 2017 10:45:02 -0600
Webster Homer  wrote:

> I was seeing something like this, and it turned out to be a problem with
> our autoCommit and autoSoftCommit settings. We had overly aggressive
> settings that eventually started failing with errors around too many
> warming searchers etc...
> 
> You can test this by doing a commit and seeing if the replicas start
> returning consistent results
> 

Commit changes nothing, since number og deleted documents doesn't
change much.
Optimize makes ranking consistent over replicas for the time being,
until too many updates has hit the shard, and the number of deleted
documents (in the largest, it takes some time to prune due to a merge)
segment. Optimizing hourly is not really an option.


> On Thu, Jan 5, 2017 at 10:31 AM, Charlie Hull  wrote:
> 
> > On 05/01/2017 13:30, Morten Bøgeskov wrote:
> >
> >>
> >>
> >> Hi.
> >>
> >> We've got a SolrCloud which is sharded and has a replication factor of
> >> 2.
> >>
> >> The 2 replicas of a shard may look like this:
> >>
> >> Num Docs:5401023
> >> Max Doc:6388614
> >> Deleted Docs:987591
> >>
> >>
> >> Num Docs:5401023
> >> Max Doc:5948122
> >> Deleted Docs:547099
> >>
> >> We've seen >10% difference in Max Doc at times with same Num Docs.
> >> Our use case is few documents that are search and many small that
> >> are filtered against (often updated multiple times a day), so the
> >> difference in deleted docs aren't surprising.
> >>
> >> This results in a different score for a document depending on which
> >> replica it comes from. As I see it: it has to do with the different
> >> maxDoc value when calculating idf.
> >>
> >> This in turn alters a specific document's position in the search
> >> result over reloads. This is quite confusing (duplicates in pagination).
> >>
> >> What is the trick to get homogeneous score from different replicas.
> >> We've tried using ExactStatsCache & ExactSharedStatsCache, but that
> >> didn't seem to make any difference.
> >>
> >> Any hints to this will be greatly appreciated.
> >>
> >>
> > This was one of things we looked at during our recent Lucene London
> > Hackday (see item 3) https://github.com/flaxsearch/london-hackday-2016
> >
> > I'm not sure there is a way to get a homogenous score - this patch tries
> > to keep you connected to the same replica during a session so you don't see
> > results jumping over pagination.
> >
> > Cheers
> >
> > Charlie
> >
> >
> > --
> > Charlie Hull
> > Flax - Open Source Enterprise Search
> >
> > tel/fax: +44 (0)8700 118334
> > mobile:  +44 (0)7767 825828
> > web: www.flax.co.uk
> >
> 



-- 
 Morten Bøgeskov