Re: Provide value to uniqueID

2014-06-08 Thread Shalin Shekhar Mangar
You can specify the file name as the id by adding a TemplateTransformer on
the entity "x" and specifying ${f.file} as the template value in the "id"
field. For example:



  

  


  

  




On Mon, Jun 9, 2014 at 11:23 AM, ienjreny  wrote:

> Hello,
>
> I am using the following code to read text files
>
> 
>
>   
>  baseDir="F:\Work\Lucene\Solr\Solr Arabic Book" fileName=".txt"
> recursive="true" rootEntity="false">
>url="${f.fileAbsolutePath}">
> 
> 
>   
> 
>   
>
> it is working perfect except the id value, how can I use file name (or any
> value) as value for uniqeuID field
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Provide-value-to-uniqueID-tp4140712.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Customizing Solr; Where to draw the line?

2014-06-08 Thread Meraj A. Khan
Phanindra.


1. I had no such need , the index was built from scratch.
2. Yes , I created a new URL and mapped it to the customized search handler.
3. The only thing I modified was the SearchHandler and specified that as
the class for newly created URL in #2 above.

Hope this helps.


On Mon, Jun 9, 2014 at 12:22 AM, Phanindra R  wrote:

> Thanks for the reply Meraj. It'd be great if you could share info on
> following as well.
>
> 1) Did you have to load any stuff from database? How did you do that?
>
> 2) Did you create new urls and map them to your custom handlers?
>
> 3) Did you declare any custom object types in the solr-config.xml i.e.
> other than , , etc.
>
>
> On Sun, Jun 8, 2014 at 7:42 PM, Meraj A. Khan  wrote:
>
> > I hane gone with approach #2 to avoid latency issues as well,
> specifically
> > for spellcheck related functionality.I have not gone onto to SolrCloud
> yet
> > ,so I cannot comment on the distributed search part.
> > On Jun 8, 2014 10:38 PM, "Phanindra R"  wrote:
> >
> > > Hi,
> > >
> > > We have decided to migrate from Lucene 3.x to latest Solr. A lot of
> > > architectural discussions are going on. There are two possible
> > approaches.
> > >
> > > Please note that our customer-facing app (or any client) and Search are
> > > hosted on different machines.
> > >
> > > *1) Have a clean architecture*
> > > - Solr takes care of customized search only.
> > >
> > >- We certainly have to override some filtering, scoring,etc.
> > >
> > > - There will be an intermediary search-app that
> > >
> > >- receives queries
> > >   - does a/b testing assignments, and other non-search stuff.
> > >   - does query expansion / rewriting (to avoid every Solr shard
> doing
> > >   that)
> > >   - transforms query into Solr syntax and uses Solr's http API to
> > >   consume it.
> > >   - returns the response to customer-facing app or whatever the
> > client
> > >   is.
> > >
> > >The problem with this approach is the additional layer and the
> latency
> > > between search-app and solr. The client of search has to make an API
> > call,
> > > across the network, to the intermediary search-app which in turns makes
> > > another Http API call to Solr.
> > >
> > > *2) Customize Solr to the full extent*
> > >
> > >- Do all the crazy stuff within Solr.
> > >- We can literally create a new url and register a handler class to
> > >process that. With some limitations, we should be able to do almost
> > >anything.
> > >
> > >  The benefit of this approach is that it obviates the additional
> > layer
> > > and the latency. However, I see a lot of long-term problems like hard
> to
> > > upgrade Solr's version, Dev flexibility (usage of Spring, Hib, etc.).
> > >
> > > How about a distributed search? Where do above approaches stand?
> > >
> > > I understand that this is a subjective question. It'd be helpful if you
> > > could share your thoughts and experiences.
> > >
> > > Thanks.
> > >
> >
>


Provide value to uniqueID

2014-06-08 Thread ienjreny
Hello,

I am using the following code to read text files


  
  

  


  

  

it is working perfect except the id value, how can I use file name (or any
value) as value for uniqeuID field



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Provide-value-to-uniqueID-tp4140712.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Delta Import Functionality

2014-06-08 Thread ajay59
Hi,

I tried the way you said , but still it's not working. I am sharing the
screenshots for your reference.

Thanks for help.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Delta-Import-Functionality-tp4140063p4140709.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Customizing Solr; Where to draw the line?

2014-06-08 Thread Phanindra R
Thanks for the reply Meraj. It'd be great if you could share info on
following as well.

1) Did you have to load any stuff from database? How did you do that?

2) Did you create new urls and map them to your custom handlers?

3) Did you declare any custom object types in the solr-config.xml i.e.
other than , , etc.


On Sun, Jun 8, 2014 at 7:42 PM, Meraj A. Khan  wrote:

> I hane gone with approach #2 to avoid latency issues as well, specifically
> for spellcheck related functionality.I have not gone onto to SolrCloud yet
> ,so I cannot comment on the distributed search part.
> On Jun 8, 2014 10:38 PM, "Phanindra R"  wrote:
>
> > Hi,
> >
> > We have decided to migrate from Lucene 3.x to latest Solr. A lot of
> > architectural discussions are going on. There are two possible
> approaches.
> >
> > Please note that our customer-facing app (or any client) and Search are
> > hosted on different machines.
> >
> > *1) Have a clean architecture*
> > - Solr takes care of customized search only.
> >
> >- We certainly have to override some filtering, scoring,etc.
> >
> > - There will be an intermediary search-app that
> >
> >- receives queries
> >   - does a/b testing assignments, and other non-search stuff.
> >   - does query expansion / rewriting (to avoid every Solr shard doing
> >   that)
> >   - transforms query into Solr syntax and uses Solr's http API to
> >   consume it.
> >   - returns the response to customer-facing app or whatever the
> client
> >   is.
> >
> >The problem with this approach is the additional layer and the latency
> > between search-app and solr. The client of search has to make an API
> call,
> > across the network, to the intermediary search-app which in turns makes
> > another Http API call to Solr.
> >
> > *2) Customize Solr to the full extent*
> >
> >- Do all the crazy stuff within Solr.
> >- We can literally create a new url and register a handler class to
> >process that. With some limitations, we should be able to do almost
> >anything.
> >
> >  The benefit of this approach is that it obviates the additional
> layer
> > and the latency. However, I see a lot of long-term problems like hard to
> > upgrade Solr's version, Dev flexibility (usage of Spring, Hib, etc.).
> >
> > How about a distributed search? Where do above approaches stand?
> >
> > I understand that this is a subjective question. It'd be helpful if you
> > could share your thoughts and experiences.
> >
> > Thanks.
> >
>


Re: Extract values from custom function for ValueSource with multiple indexable fields

2014-06-08 Thread david.w.smi...@gmail.com
I suggest investigating this using a known example that does this, such as
LatLonType and geodist().  LatLonType registers the field in a custom way
too.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


On Sun, Jun 8, 2014 at 7:54 AM, Costi Muraru  wrote:

> Hi guys,
>
> I have a custom FieldType that adds several IndexableFields for each
> document.
> I also have a custom function, in which I want to retrieve these indexable
> fields. I can't seem to be able to do so. I have added some code snippets
> below.
> Any help is gladly appreciated.
>
> Thanks,
> Costi
>
> public class MyField extends FieldType {
> @Override
> public final java.util.List createFields(SchemaField
> field, Object val, float boost) {
> List result = new ArrayList();
> result.add(new Field(field.getName(), "field1", FIELD_TYPE));
> result.add(new Field(field.getName(), "123", FIELD_TYPE));
> result.add(new Field(field.getName(), "ABC", FIELD_TYPE));
> return result;
> }
> }
>
>
> public class MyFunctionParser extends ValueSourceParser {
> @Override
> public ValueSource parse(FunctionQParser fqp) throws SyntaxError {
> ValueSource fieldName = fqp.parseValueSource();
> return new MyFunction(fieldName);
> }
> }
>
> public class MyFunction extends ValueSource {
> ...
> @Override
> public FunctionValues getValues(Map context, AtomicReaderContext
> readerContext) throws IOException {
> final FunctionValues values = valueSource.getValues(context,
> readerContext);
> LOG.debug("Value is: " + values.strVal(doc); *// prints "123" -
> how can I retrieve the "field1" and "ABC" indexable fields as well?*
> }
> }
>
>


Re: Customizing Solr; Where to draw the line?

2014-06-08 Thread Meraj A. Khan
I hane gone with approach #2 to avoid latency issues as well, specifically
for spellcheck related functionality.I have not gone onto to SolrCloud yet
,so I cannot comment on the distributed search part.
On Jun 8, 2014 10:38 PM, "Phanindra R"  wrote:

> Hi,
>
> We have decided to migrate from Lucene 3.x to latest Solr. A lot of
> architectural discussions are going on. There are two possible approaches.
>
> Please note that our customer-facing app (or any client) and Search are
> hosted on different machines.
>
> *1) Have a clean architecture*
> - Solr takes care of customized search only.
>
>- We certainly have to override some filtering, scoring,etc.
>
> - There will be an intermediary search-app that
>
>- receives queries
>   - does a/b testing assignments, and other non-search stuff.
>   - does query expansion / rewriting (to avoid every Solr shard doing
>   that)
>   - transforms query into Solr syntax and uses Solr's http API to
>   consume it.
>   - returns the response to customer-facing app or whatever the client
>   is.
>
>The problem with this approach is the additional layer and the latency
> between search-app and solr. The client of search has to make an API call,
> across the network, to the intermediary search-app which in turns makes
> another Http API call to Solr.
>
> *2) Customize Solr to the full extent*
>
>- Do all the crazy stuff within Solr.
>- We can literally create a new url and register a handler class to
>process that. With some limitations, we should be able to do almost
>anything.
>
>  The benefit of this approach is that it obviates the additional layer
> and the latency. However, I see a lot of long-term problems like hard to
> upgrade Solr's version, Dev flexibility (usage of Spring, Hib, etc.).
>
> How about a distributed search? Where do above approaches stand?
>
> I understand that this is a subjective question. It'd be helpful if you
> could share your thoughts and experiences.
>
> Thanks.
>


Customizing Solr; Where to draw the line?

2014-06-08 Thread Phanindra R
Hi,

We have decided to migrate from Lucene 3.x to latest Solr. A lot of
architectural discussions are going on. There are two possible approaches.

Please note that our customer-facing app (or any client) and Search are
hosted on different machines.

*1) Have a clean architecture*
- Solr takes care of customized search only.

   - We certainly have to override some filtering, scoring,etc.

- There will be an intermediary search-app that

   - receives queries
  - does a/b testing assignments, and other non-search stuff.
  - does query expansion / rewriting (to avoid every Solr shard doing
  that)
  - transforms query into Solr syntax and uses Solr's http API to
  consume it.
  - returns the response to customer-facing app or whatever the client
  is.

   The problem with this approach is the additional layer and the latency
between search-app and solr. The client of search has to make an API call,
across the network, to the intermediary search-app which in turns makes
another Http API call to Solr.

*2) Customize Solr to the full extent*

   - Do all the crazy stuff within Solr.
   - We can literally create a new url and register a handler class to
   process that. With some limitations, we should be able to do almost
   anything.

 The benefit of this approach is that it obviates the additional layer
and the latency. However, I see a lot of long-term problems like hard to
upgrade Solr's version, Dev flexibility (usage of Spring, Hib, etc.).

How about a distributed search? Where do above approaches stand?

I understand that this is a subjective question. It'd be helpful if you
could share your thoughts and experiences.

Thanks.


Re: Any way to view lucene files

2014-06-08 Thread Alexandre Rafalovitch
Have you looked at:
https://github.com/DmitryKey/luke

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Mon, Jun 9, 2014 at 8:12 AM, Aman Tandon  wrote:
> I guess this is not available now. I am trying to download from the google,
> please take a look https://code.google.com/p/luke/downloads/list
>
> If you have any link please share
>
> With Regards
> Aman Tandon
>
>
> On Sat, Jun 7, 2014 at 10:32 PM, Summer Shire  wrote:
>
>>
>> Did u try  luke 47
>>
>>
>>
>> > On Jun 6, 2014, at 11:59 PM, Aman Tandon 
>> wrote:
>> >
>> > I also tried with solr 4.2 and with luke version Luke 4.0.0-ALPHA
>> >
>> > but got this error:
>> > java.lang.IllegalArgumentException: A SPI class of type
>> > org.apache.lucene.codecs.Codec with name 'Lucene42' does not exist. You
>> > need to add the corresponding JAR file supporting this SPI to your
>> > classpath.The current classpath supports the following names: [Lucene40,
>> > Lucene3x, SimpleText, Appending]
>> >
>> > With Regards
>> > Aman Tandon
>> >
>> >
>> > On Sat, Jun 7, 2014 at 12:22 PM, Aman Tandon 
>> > wrote:
>> >
>> >> My solr version is 4.8.1 and luke is 3.5
>> >>
>> >> With Regards
>> >> Aman Tandon
>> >>
>> >>
>> >> On Sat, Jun 7, 2014 at 12:21 PM, Chris Collins 
>> >> wrote:
>> >>
>> >>> What version of Solr / Lucene are you using?  You have to match the
>> Luke
>> >>> version to the same version of Lucene.
>> >>>
>> >>> C
>>  On Jun 6, 2014, at 11:42 PM, Aman Tandon 
>> wrote:
>> 
>>  Yes  tried, but it not working at all every time i choose my index
>>  directory it shows me EOF past
>> 
>>  With Regards
>>  Aman Tandon
>> 
>> 
>> > On Sat, Jun 7, 2014 at 12:01 PM, Chris Collins > >
>>  wrote:
>> 
>> > Have you tried:
>> >
>> > https://code.google.com/p/luke/
>> >
>> > Best
>> >
>> > Chris
>> > On Jun 6, 2014, at 11:24 PM, Aman Tandon 
>> >>> wrote:
>> >
>> >> Hi,
>> >>
>> >> Is there any way so that i can view what information and which is
>> >>> there
>> > in
>> >> my _e.fnm, etc files. may be with the help of any application or any
>> > viewer
>> >> tool.
>> >>
>> >> With Regards
>> >> Aman Tandon
>> >>
>>


Re: Any way to view lucene files

2014-06-08 Thread Aman Tandon
I guess this is not available now. I am trying to download from the google,
please take a look https://code.google.com/p/luke/downloads/list

If you have any link please share

With Regards
Aman Tandon


On Sat, Jun 7, 2014 at 10:32 PM, Summer Shire  wrote:

>
> Did u try  luke 47
>
>
>
> > On Jun 6, 2014, at 11:59 PM, Aman Tandon 
> wrote:
> >
> > I also tried with solr 4.2 and with luke version Luke 4.0.0-ALPHA
> >
> > but got this error:
> > java.lang.IllegalArgumentException: A SPI class of type
> > org.apache.lucene.codecs.Codec with name 'Lucene42' does not exist. You
> > need to add the corresponding JAR file supporting this SPI to your
> > classpath.The current classpath supports the following names: [Lucene40,
> > Lucene3x, SimpleText, Appending]
> >
> > With Regards
> > Aman Tandon
> >
> >
> > On Sat, Jun 7, 2014 at 12:22 PM, Aman Tandon 
> > wrote:
> >
> >> My solr version is 4.8.1 and luke is 3.5
> >>
> >> With Regards
> >> Aman Tandon
> >>
> >>
> >> On Sat, Jun 7, 2014 at 12:21 PM, Chris Collins 
> >> wrote:
> >>
> >>> What version of Solr / Lucene are you using?  You have to match the
> Luke
> >>> version to the same version of Lucene.
> >>>
> >>> C
>  On Jun 6, 2014, at 11:42 PM, Aman Tandon 
> wrote:
> 
>  Yes  tried, but it not working at all every time i choose my index
>  directory it shows me EOF past
> 
>  With Regards
>  Aman Tandon
> 
> 
> > On Sat, Jun 7, 2014 at 12:01 PM, Chris Collins  >
>  wrote:
> 
> > Have you tried:
> >
> > https://code.google.com/p/luke/
> >
> > Best
> >
> > Chris
> > On Jun 6, 2014, at 11:24 PM, Aman Tandon 
> >>> wrote:
> >
> >> Hi,
> >>
> >> Is there any way so that i can view what information and which is
> >>> there
> > in
> >> my _e.fnm, etc files. may be with the help of any application or any
> > viewer
> >> tool.
> >>
> >> With Regards
> >> Aman Tandon
> >>
>


Re: Performance/scaling with custom function queries

2014-06-08 Thread Joel Bernstein
You only need to have fast access to the fingerprint field so only that
field needs to be in memory. You'll want to review how Lucene DocValues and
FieldCache work. Sorting is done with a PriorityQueue so only the top N
docs are kept in memory.

You'll only need to access the fingerprint field values for documents that
match the query, so it won't be a full table scan unless all the docs match
the query.

Sounds like an interesting project. Please keep us posted.

Joel Bernstein
Search Engineer at Heliosearch


On Sun, Jun 8, 2014 at 6:17 AM, Robert Krüger  wrote:

> Hi,
>
> let's say I have an index that contains a field of type BinaryField
> called "fingerprint" that stores a few (let's say 100) bytes that are
> some kind of digital fingerprint-like thing.
>
> Let's say I want to perform queries on that field to achieve sorting
> or filtering based on a kind of custom distance function
> "customDistance", i.e. I input a reference "fingerprint" and Solr
> returns either all documents sorted by
> customDistance(,) or use
> that in an frange expression for filtering.
>
> I have read http://wiki.apache.org/solr/SolrPerformanceFactors and I
> do understand that using function queries with a custom function is
> definitely an expensive thing as it will result in what is called a
> "full table scan" in the sql world, i.e. data from all documents needs
> to be touched to select the correct documents or sort by a function's
> result.
>
> Given all that and provided, I have to use a custom function for my
> needs, I would like to know a few more details about solr architecture
> to understand what I have to look out for.
>
> I will have potentially millions of records. Does the data contained
> in other index fields play a role when I only use the "fingerprint"
> field for sorting and searching when it comes to RAM usage? I am
> hoping to calculate that my RAM should be able to accommodate the
> fingerprint data of all available documents for the queries to be fast
> but not fingerprint data and all other indexed or stored data.
>
> Example: My fingerprint data needs 100bytes per document, my other
> indexed field data needs 900 bytes per document. Will I need 100MB or
> 1GB to fit all data that is needed to process one query in memory?
>
> Are there other things to be aware of?
>
> Thanks,
>
> Robert
>


Re: Is it possible for solr to calculate and give back the price of a product based on its sub-products

2014-06-08 Thread S.L
I am not sure if that is doable , I think it needs to be taken care of at
the indexing time.


On Sun, Jun 8, 2014 at 4:55 PM, Gharbi Mohamed <
gharbi.mohamed.e...@gmail.com> wrote:

> Hi,
>
> I am using Solr for searching magento products in my project,
> I want to know, is it possible for solr to calculate and give back the
> price
> of a product based on its sub-products(items);
>
> For instance, i have a product P1 and it is the parent of items m1, m2.
> i need to get the minimal price of items and return it as a price of
> product
> P1.
>
> I'm wondering if that is possible ?
> I need to know if solr can do that or if there is a feature or a way to do
> it ?
> And finally i thank you!
>
> regards,
> Mohamed.
>
>


Setup a Solr Cloud on a set of powerful machines

2014-06-08 Thread shushuai zhu
Hi,

I would like to get some advice to setup a Solr Cloud on a set of powerful 
machines. The average size of the documents handled by the Solr Cloud is about 
0.5 KB, and the number of documents stored in Solr Cloud could reach billions. 
When indexing, the incoming document rate could be as high as 20k/second; and 
the major query operations performed on the Cloud are searching, faceting, and 
some other aggregations. There will NOT be many concurrent queries (replication 
factor of 2 may be good enough), but some queries could cover big range of 
documents.

As an example, I have 8 powerful machines (nodes), and each machine (node) has:

16 CPU cores
256GB RAM
48TB physical disk space

The Solr Cloud may be setup in following different ways (assuming replication 
factor is 2):

1) 8 shards on 8 Solr servers, total 16 cores (including replicas)
Each machine (node) holds one Solr server (JVM), and each Solr server has one 
shard. 

2) 32 shards on 8 Solr servers, total 64 cores (including replicas)
Each machine (node) holds one Solr server (JVM), and each Solr server has 4 
shards. 

3) 32 shards on 16 Solr servers, total 64 cores (including replicas)
Each machine (node) holds 2 Solr servers (JVMs), and each Solr server has 2 
shards.

4) 64 shards on 16 Solr servers, total 128 cores (including replicas)
Each machine (node) holds 2 Solr servers (JVMs), and each Solr server has 4 
shards.

5) 128 shards on 32 Solr servers, total 256 cores (including replicas)
Each machine (node) holds 4 Solr servers (JVMs), and each Solr server has 4 
shards.

Could someone advice which layout is better? Or you have some other better 
layout? The basic idea is to "divide" a powerful machine to have more Solr 
Servers and/or more shards. I would like to get some advice about the 
trade-offs and general guidelines about the division. It would be very helpful 
if you can advice an example setup for this use case.

Thanks a lot.

Shushuai


Is it possible for solr to calculate and give back the price of a product based on its sub-products

2014-06-08 Thread Gharbi Mohamed
Hi,

I am using Solr for searching magento products in my project,
I want to know, is it possible for solr to calculate and give back the price
of a product based on its sub-products(items);

For instance, i have a product P1 and it is the parent of items m1, m2.
i need to get the minimal price of items and return it as a price of product
P1.

I'm wondering if that is possible ?
I need to know if solr can do that or if there is a feature or a way to do
it ?
And finally i thank you!

regards,
Mohamed.



Re: span query with SHOUD semantic instead of MUST HAVE

2014-06-08 Thread Erick Erickson
What is your problem? I mean what kind of real-world issue are you
requiring this behavior for? Or is this mostly so you can understand
scoring better?

Very often this kind of question is a test artifact. As far as I know,
the distance isn't part of the scoring, the fact that there's an extra
token between just isn't considered relevant (I may be dead wrong on
this, but...)...

So if this is theoretical, I believe the answer is "because distance
isn't part of the scoring formula".
If this is practical in that you need terms closer together to bubble
up to the top, then something phrase queries with slop and boosts
would help, something like
"aa bb"^100 OR "aa bb"~10^10 OR field:(aa bb)

Best,
Erick

On Fri, Jun 6, 2014 at 5:48 AM, 郑华斌  wrote:
> hi,
>
>
> I have two docs,
> a) "aa bb cc" and,
> b) "aa cc bb".
> The query is "aa bb". What I expected is the doc a comes first with a higher 
> score than doc b because the term distance in query and that in doc a are 
> more similar.
> After google for a while I get it down with the span query q: "aa bb"~10. 
> However, when I change my query into "aa bb dd"~10, the span query return 
> nothing
> hits becuase dd can not be found in any doc. So what's a solution to this 
> problem?
>
>
> Thanks.


Re: Documents Added Not Available After Commit (Both Soft and Hard)

2014-06-08 Thread Erick Erickson
That's strange. Are you saying that after you see the "No uncommitted changes.
Skipping IW.commit." message, that some time later the docs will appear
even though you haven't updated them after you see the message above?

I have upon occasion seen people get fooled by either browser or container
caching, have you checked for those? This is on the assumption that your "No
uncommitted changes" message is actually accurate and is for an interval other
than the one you're looking at. But that's a wild guess.

How are you sending updates to Solr?

I notice a couple of things on a very quick look:
1> you have your rambuffersize set to 1G. I've rarely seen this do
much good past 128M, you might want some of this RAM back
2> your soft commit is set to 18 seconds. This should show you docs
added 20 seconds ago.
3> You've set false. I'm assuming here that you're
NOT running ZooKeeper?

But this is puzzling. This hasn't been reported by others, so I'm
tending to think about something innocent-seeming about your setup
that is causing this but confess I haven't a clue what.

Erick

On Fri, Jun 6, 2014 at 6:47 AM, Justin Sweeney
 wrote:
> Hi,
>
> An application I am working on indexes documents to a Solr index. This Solr
> index is setup as a single node, without any replication. This index is
> running Solr 4.5.0.
>
> We have noticed an issue lately that is causing some problems for our
> application. The problem is that we add/update a number of documents in the
> Solr index and we have the index setup to autoCommit (hard) once every 30
> minutes. In the Solr logs, I am able to see the add command to Solr and I
> can also see Solr start the hard commit. When this hard commit occurs, we
> see the following message:
> INFO  - 2014-06-04 20:13:55.135;
> org.apache.solr.update.DirectUpdateHandler2; No uncommitted changes.
> Skipping IW.commit.
>
> This only happens sometimes, but Solr will go hours (we have seen 6-12
> hours of this behavior) before it does a hard commit where it find changes.
> After the hard commit where the changes are found, we are then able to
> search for and find the documents that were added hours ago, but up until
> that point the documents are not searchable.
>
> We tried enabling autoSoftCommit every 5 minutes in the hope that this
> would help, but we are seeing the same behavior.
>
> Here is a sampling of the logs showing this occurring (I've trimmed it down
> to just show what is happening):
>
> INFO  - 2014-06-05 20:00:41.300;
>>> org.apache.solr.update.processor.LogUpdateProcessor; [zoomCollection]
>>> webapp=/solr path=/update params={wt=javabin&version=2} {add=[359453225]} 0
>>> 0
>>
>> INFO  - 2014-06-05 20:00:41.376;
>>> org.apache.solr.update.processor.LogUpdateProcessor; [zoomCollection]
>>> webapp=/solr path=/update params={wt=javabin&version=2} {add=[347170717]} 0
>>> 1
>>
>> INFO  - 2014-06-05 20:00:51.527;
>>> org.apache.solr.update.DirectUpdateHandler2; start
>>> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
>>
>> INFO  - 2014-06-05 20:00:51.533; org.apache.solr.search.SolrIndexSearcher;
>>> Opening Searcher@257c43d main
>>
>> INFO  - 2014-06-05 20:00:51.533;
>>> org.apache.solr.update.DirectUpdateHandler2; end_commit_flush
>>
>> INFO  - 2014-06-05 20:00:51.545; org.apache.solr.core.QuerySenderListener;
>>> QuerySenderListener sending requests to Searcher@257c43d
>>> main{StandardDirectoryReader(segments_acl:1367002775953
>>> _2f28(4.5):C13583563/4081507 _2gl6(4.5):C2754573/193533
>>> _2g21(4.5):C1046256/296354 _2ge2(4.5):C835858/206139
>>> _2gqd(4.5):C383500/31051 _2gmu(4.5):C125197/32491 _2grl(4.5):C46906/1255
>>> _2gpj(4.5):C66480/16562 _2gra(4.5):C364/22 _2gr1(4.5):C36064/2556
>>> _2gqg(4.5):C42504/21515 _2gqm(4.5):C26821/12659 _2gqu(4.5):C24172/10240
>>> _2gqy(4.5):C697/215 _2gr2(4.5):C878/352 _2gr7(4.5):C28135/11775
>>> _2gr9(4.5):C3276/1341 _2grb(4.5):C5/1 _2grc(4.5):C3247/1219 _2grd(4.5):C6/1
>>> _2grf(4.5):C5/2 _2grg(4.5):C23659/10967 _2grh(4.5):C1 _2grj(4.5):C1
>>> _2grk(4.5):C5160/1482 _2grm(4.5):C1210/351 _2grn(4.5):C3957/1372
>>> _2gro(4.5):C7734/2207 _2grp(4.5):C220/36)}
>>
>> INFO  - 2014-06-05 20:00:51.546; org.apache.solr.core.SolrCore;
>>> [zoomCollection] webapp=null path=null
>>> params={event=newSearcher&q=d_name:ibm&distrib=false} hits=38 status=0
>>> QTime=0
>>
>> INFO  - 2014-06-05 20:00:51.546; org.apache.solr.core.QuerySenderListener;
>>> QuerySenderListener done.
>>
>> INFO  - 2014-06-05 20:00:51.547; org.apache.solr.core.SolrCore;
>>> [zoomCollection] Registered new searcher Searcher@257c43d
>>> main{StandardDirectoryReader(segments_acl:1367002775953
>>> _2f28(4.5):C13583563/4081507 _2gl6(4.5):C2754573/193533
>>> _2g21(4.5):C1046256/296354 _2ge2(4.5):C835858/206139
>>> _2gqd(4.5):C383500/31051 _2gmu(4.5):C125197/32491 _2grl(4.5):C46906/1255
>>> _2gpj(4.5):C66480/16562 _2gra(4.5):C364/22 _2gr1(4.5):C36064/2556
>>> _2gqg(4.5):C42504/21515 _2gqm(4.5):C26821/12659 _2gqu(4.5):C24172/102

Re: Tomcat restart removes the Core.

2014-06-08 Thread Erick Erickson
bq: What does "disappeared" mean? Not showing up in the admin UI?
Files on disk erased? The former may well be the persist bit, the
latter would be really weird.

And does solr.xml change when you create new cores? How do you create
new cores? Really, read:
http://wiki.apache.org/solr/UsingMailingLists

Best,
Erick

On Thu, Jun 5, 2014 at 1:56 PM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions) 
wrote:
> Thanks for looking my email. Below is the content in the solr.xml under 
> solr-home\solr directory
>
> 
> 
>   
> 
> 
>   
> 
>
> -Original Message-
> From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com]
> Sent: Thursday, June 05, 2014 4:30 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Tomcat restart removes the Core.
>
> Did you put that attribute on the root element, or somewhere else? The 
> beginning of solr.xml should look like this:
>
> 
> 
>
> Michael Della Bitta
>
> Applications Developer
>
> o: +1 646 532 3062
>
> appinions inc.
>
> “The Science of Influence Marketing”
>
> 18 East 41st Street
>
> New York, NY 10017
>
> t: @appinions  | g+:
> plus.google.com/appinions
> 
> w: appinions.com 
>
>
> On Thu, Jun 5, 2014 at 3:52 PM, EXTERNAL Taminidi Ravi (ETI,
> Automotive-Service-Solutions)  wrote:
>
>> I update persistent=true in the solr.xml but still no change , after a
>> restart the Cores are removed..
>>
>> -Original Message-
>> From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com]
>> Sent: Wednesday, June 04, 2014 2:54 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Tomcat restart removes the Core.
>>
>> Any chance you don't have a persistent="true" attribute in your solr.xml?
>>
>> Michael Della Bitta
>>
>> Applications Developer
>>
>> o: +1 646 532 3062
>>
>> appinions inc.
>>
>> “The Science of Influence Marketing”
>>
>> 18 East 41st Street
>>
>> New York, NY 10017
>>
>> t: @appinions  | g+:
>> plus.google.com/appinions
>> <
>> https://plus.google.com/u/0/b/112002776285509593336/112002776285509593
>> 336/posts
>> >
>> w: appinions.com 
>>
>>
>> On Wed, Jun 4, 2014 at 1:06 PM, EXTERNAL Taminidi Ravi (ETI,
>> Automotive-Service-Solutions)  wrote:
>>
>> > All, Can anyone help me on what is going wrong in my tomcat. When I
>> > restart the tomcat after schema update, the Cores are removed.
>> >
>> > I need to add the cores manually to get back them on work.
>> >
>> > Is there anything someone experience..
>> >
>> > Thanks
>> >
>> > Ravi
>> >
>>


SOLR Performance Benchmarking

2014-06-08 Thread rashi gandhi
Hi,

I am using SolrMeter for performance benchmarking. I am able to
successfully test my solr setup up to 1000 queries per min while
searching.
But when I am exceeding this limit say 1500 search queries per min,
facing "Server Refused Connection" in SOLR.
Currently, I have only one solr server running on 64-bit 4 GB ram
machine for testing.

Please provide me some pointers , to optimize SOLR so that it can
handle large number of request. (Specially more than 1000 request per
min).
Is there any change that I can do in solrconfig.xml or some other
change to support this?


Thanks in Advance





DISCLAIMER
==
This e-mail may contain privileged and confidential information which
is the property of Persistent Systems Ltd. It is intended only for the
use of the individual or entity to which it is addressed. If you are
not the intended recipient, you are not authorized to read, retain,
copy, print, distribute or use this message. If you have received this
communication in error, please notify the sender and delete all copies
of this message. Persistent Systems Ltd. does not accept any liability
for virus infected mails.


Extract values from custom function for ValueSource with multiple indexable fields

2014-06-08 Thread Costi Muraru
Hi guys,

I have a custom FieldType that adds several IndexableFields for each
document.
I also have a custom function, in which I want to retrieve these indexable
fields. I can't seem to be able to do so. I have added some code snippets
below.
Any help is gladly appreciated.

Thanks,
Costi

public class MyField extends FieldType {
@Override
public final java.util.List createFields(SchemaField
field, Object val, float boost) {
List result = new ArrayList();
result.add(new Field(field.getName(), "field1", FIELD_TYPE));
result.add(new Field(field.getName(), "123", FIELD_TYPE));
result.add(new Field(field.getName(), "ABC", FIELD_TYPE));
return result;
}
}


public class MyFunctionParser extends ValueSourceParser {
@Override
public ValueSource parse(FunctionQParser fqp) throws SyntaxError {
ValueSource fieldName = fqp.parseValueSource();
return new MyFunction(fieldName);
}
}

public class MyFunction extends ValueSource {
...
@Override
public FunctionValues getValues(Map context, AtomicReaderContext
readerContext) throws IOException {
final FunctionValues values = valueSource.getValues(context,
readerContext);
LOG.debug("Value is: " + values.strVal(doc); *// prints "123" - how
can I retrieve the "field1" and "ABC" indexable fields as well?*
}
}


Performance/scaling with custom function queries

2014-06-08 Thread Robert Krüger
Hi,

let's say I have an index that contains a field of type BinaryField
called "fingerprint" that stores a few (let's say 100) bytes that are
some kind of digital fingerprint-like thing.

Let's say I want to perform queries on that field to achieve sorting
or filtering based on a kind of custom distance function
"customDistance", i.e. I input a reference "fingerprint" and Solr
returns either all documents sorted by
customDistance(,) or use
that in an frange expression for filtering.

I have read http://wiki.apache.org/solr/SolrPerformanceFactors and I
do understand that using function queries with a custom function is
definitely an expensive thing as it will result in what is called a
"full table scan" in the sql world, i.e. data from all documents needs
to be touched to select the correct documents or sort by a function's
result.

Given all that and provided, I have to use a custom function for my
needs, I would like to know a few more details about solr architecture
to understand what I have to look out for.

I will have potentially millions of records. Does the data contained
in other index fields play a role when I only use the "fingerprint"
field for sorting and searching when it comes to RAM usage? I am
hoping to calculate that my RAM should be able to accommodate the
fingerprint data of all available documents for the queries to be fast
but not fingerprint data and all other indexed or stored data.

Example: My fingerprint data needs 100bytes per document, my other
indexed field data needs 900 bytes per document. Will I need 100MB or
1GB to fit all data that is needed to process one query in memory?

Are there other things to be aware of?

Thanks,

Robert


Re: Solr Realtime Get RemoteSolrException: Expected mime type application/xml but got text/html

2014-06-08 Thread Shalin Shekhar Mangar
Since it is returning a 404, I guess the real time get handler is not
enabled on your remote Solr.

Make sure that your solrconfig.xml has the following somewhere:

 
   true
   json
   true
 
  


On Sat, Jun 7, 2014 at 1:41 AM, Songtao Zheng 
wrote:

> Solr version on remote server: solr-4.3.1 I am trying to use Solr Realtime
> Get  to retrieve document before
> commit. My code
>
> class Test3
> {
>   static main(args)
>   {
> def test = new Test3()
> test.run()
>   }
>
>   private run()
>   {
> String url = "DEV_SERVER:8983/solr/emr"
>
> HttpSolrServer solr = new HttpSolrServer(url)
>
> SolrQuery q = new SolrQuery();
> q.setRequestHandler("/get");
> q.set("rid",
> "6662c0f2.ee6a64fe.588j6qohe.9kd087u.0r00dg.6kr5pc2as0qu9m4ibr7f7");
>
> QueryRequest req = new QueryRequest(q);
> req.setResponseParser(new BinraryResponseParser());
>
> println "=="
> rsp = req.process(solr);// ERROR
>   }
> }
>
> *The error stacktrace is:*
> Caught:
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
> Expected mime type application/octet-stream but got
> text/html. 
> 
> 
> Error 404 Not Found
> 
> HTTP ERROR 404
> Problem accessing /solr/emr/get. Reason:
> Not FoundPowered by
> Jetty://
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>
> 
> 
>
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
> Expected mime type application/octet-stream but got text/htm
> l. 
> 
> 
> Error 404 Not Found
> 
> HTTP ERROR 404
> Problem accessing /solr/emr/get. Reason:
> Not FoundPowered by
> Jetty://
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>
> 
> 
>
> at
>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:459)
> at
>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
> at
>
> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91)
> at
> org.apache.solr.client.solrj.request.QueryRequest$process.call(Unknown
> Source)
> at com.att.songtao.test.Test3.run(Test3.groovy:48)
> at com.att.songtao.test.Test3.this$2$run(Test3.groovy)
> at com.att.songtao.test.Test3$this$2$run.call(Unknown Source)
> at com.att.songtao.test.Test3.main(Test3.groovy:14)
>
>
> I am following Realtime Get document and added updateLog to updateHandler
> in solrconfig.xml. My localhost "localhost:8983/solr/emr" (version
> solr-4.7.2) Realtime Get works perfect, but having it on remote server
> throws out the above error.
>
> Anyone could provide the insight?
>
> Thanks,
>
> Songtao
>



-- 
Regards,
Shalin Shekhar Mangar.