date:20130114

Calculate a sum.

2013-01-14 Thread stockii

hello.

My problem is, that i need to calculate a sum of amounts. this amount is in
my index (stored=true). my php script get all values with paging. but if a
request takes too long, jetty is killing this process and i get a broken
pipe.

Which is the best/fastest way to get the values of many fields from index?
exists an ResponseHandler for exports? Or which is the fastest?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Calculate-a-sum-tp4033091.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Calculate a sum.

2013-01-14 Thread Rafał Kuć

Hello!

Fetching all the documents, especially for a query that returns many
documents can be a pain.

However there is a StatsComponent
(http://wiki.apache.org/solr/StatsComponent) in Solr, however your
field would have to be numeric and indexed. 

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 hello.

 My problem is, that i need to calculate a sum of amounts. this amount is in
 my index (stored=true). my php script get all values with paging. but if a
 request takes too long, jetty is killing this process and i get a broken
 pipe.

 Which is the best/fastest way to get the values of many fields from index?
 exists an ResponseHandler for exports? Or which is the fastest?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Calculate-a-sum-tp4033091.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Calculate a sum.

2013-01-14 Thread stockii

Hey, thx for your reply. 

i forgot to say. StatsComponent doesnt work with our application.
too slow and buggy. but i test with this component with version 1.4 ...
maybe some bugfixes in 4.0 ?

this is the reason for calculating the sum on client side and some pages.
but sometimes its too much for server.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Calculate-a-sum-tp4033091p4033097.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: CoreAdmin STATUS performance

2013-01-14 Thread Shahar Davidson

Hi Stefan,

I have opened issue SOLR-4302 and attached the suggested patch.

Regards,

Shahar.

-Original Message-
From: Stefan Matheis [mailto:matheis.ste...@gmail.com] 
Sent: Sunday, January 13, 2013 3:11 PM
To: solr-user@lucene.apache.org
Subject: Re: CoreAdmin STATUS performance

Shahar


would you mind, if i ask you to open an jira-issue for that? attaching your 
changes as typical patch?
perhaps we could use that for the UI, in those cases where we don't need to 
full set of information ..

Stefan 


On Sunday, January 13, 2013 at 12:28 PM, Shahar Davidson wrote:

 Shawn, Per and anyone else who has participated in this thread - thank you!
 
 I have finally resorted to apply a minor patch the Solr code. 
 I have noticed that most of the time of the STATUS request is spent when 
 collecting Index related info (such as segmentCount, sizeInBytes, numDocs, 
 etc.).
 In the STATUS request I added support for a new parameter which, if present, 
 will skip collection of the Index info (hence will only return general static 
 info, among it the core name) - this will, in fact, cut down the request time 
 by an order of two magnitudes!
 In my case, it decreased the request time from around 800ms to around 1ms-4ms.
 
 Regards,
 
 Shahar.
 
 -Original Message-
 From: Shawn Heisey [mailto:s...@elyograg.org]
 Sent: Thursday, January 10, 2013 5:14 PM
 To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org)
 Subject: Re: CoreAdmin STATUS performance
 
 On 1/10/2013 2:09 AM, Shahar Davidson wrote:
  As for your first question, the core info needs to be gathered upon every 
  search request because cores are created dynamically.
  When a user initiates a search request, the system must be aware of 
  all available cores in order to execute distributed search on _all_ 
  relevant cores. (the user must get reliable and most up to date data) The 
  reason that 800ms seems a lot to me is because the overall execution time 
  takes about 2500ms and a large part of it is due to the STATUS request.
  
  The minimal interval concept is a good idea and indeed we've considered 
  it, yet it poses a slight problem when building a RT system which needs to 
  return to most up to date data.
  I am just trying to understand if there's some other way to hasten 
  the STATUS reply (for example, by asking the STATUS request to 
  return just certain core attributes, such as name, instead of 
  collecting
  everything)
 
 
 
 Are there a *huge* number of SolrJ clients in the wild, or is it something 
 like a server farm where you are in control of everything? If it's the 
 latter, what I think I would do is have an asynchronous thread that 
 periodically (every few seconds) updates the client's view of what cores 
 exist. When a query is made, it will use that information, speeding up your 
 queries by 800 milliseconds and ensuring that new cores will not have long 
 delays before they become searchable. If you have a huge number of clients in 
 the wild, it would still be possible, but ensuring that those clients get 
 updated might be hard.
 
 If you also delete cores as well as add them, that complicates things. 
 You'd have to have the clients be smart enough to exclude the last core on 
 the list (by whatever sorting mechanism you require), and you'd have to wait 
 long enough (30 seconds, maybe?) before *actually* deleting the last core to 
 be sure that no clients are accessing it.
 
 Or you could use SolrCloud, as Per suggested, but with 4.1, not the released 
 4.0. SolrCloud manages your cores for you automatically. 
 You'd probably be using a slightly customized SolrCloud, including the custom 
 hashing capability added by SOLR-2592. I don't know what other customizations 
 you might need.
 
 Thanks,
 Shawn
 
 
 Email secured by Check Point



Email secured by Check Point

Re: Calculate a sum.

2013-01-14 Thread Mikhail Khludnev

Stored fields are famous for its' slowness as well as they requires two io
operation per doc. You can spend some heap for uninverting the index and
utilize wiki.apache.org/solr/StatsComponent
Let us know whether it works for you.
14.01.2013 13:14 пользователь stockii stock.jo...@googlemail.com
написал:

 hello.

 My problem is, that i need to calculate a sum of amounts. this amount is in
 my index (stored=true). my php script get all values with paging. but if
 a
 request takes too long, jetty is killing this process and i get a broken
 pipe.

 Which is the best/fastest way to get the values of many fields from index?
 exists an ResponseHandler for exports? Or which is the fastest?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Calculate-a-sum-tp4033091.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: theory of sets

2013-01-14 Thread Uwe Reh


Am 08.01.2013 10:26, schrieb Uwe Reh:

OK, OK,
I will try it again with dynamic fields.


NO!
dynamic fields are nice, but not for my problem. :-(

I got more than *52* new fields.
I was wrong, the impact on searching is really reasonable. But have you 
ever used the Admin's Schema Browser with that much fields? I suppose 
never, my Installation (4.1) freezes the FireFox while the JS-job runs 
into a timeout.

At most I don't like it, because having that much fields 'smells' for me.

Before anyone asks the XY question.
The index is intended for a library's catalog and the quest is Find all 
members of a series (eg. penguin books, paperbacks) and order them on 
their sortkey. Unfortunately titles may belong to several (sub)series 
with different sortkeys.


Still seeking for better approaches.
Uwe

Re: configuring schema to match database

2013-01-14 Thread Jens Grivolla


On 01/11/2013 06:14 PM, Gora Mohanty wrote:

On 11 January 2013 22:30, Jens Grivolla j+...@grivolla.net wrote:
[...]

Actually, that is what you would get when doing a join in an RDBMS, the 
cross-product of your tables. This is NOT AT ALL what you typically do in Solr.

Best start the other way around, think of Solr as a retrieval system, not a 
storage system. What are your queries? What do you want to find, and what 
criteria do you use to search for it?

[...]

Um, he did describe his desired queries, and there was a reason
that I proposed the above schema design.


He said he wants queries such as users how have taken courseA and are 
fluent in english, which is exactly one case I was describing.



UserA has taken courseA, courseB and courseC and has writingskill
good verbalskill good for english and writingskill excellent
verbalskill excellent for spanish UserB has taken courseA, courseF,
courseG and courseH and has writingskill fluent verbalskill fluent
for english and writingskill good verbalskill good for italian


Unless the index is becoming huge, I feel that it is better to
flatten everything out rather than combine fields, and
post-process the results.


Then please show me the query to find users that are fluent in spanish 
and english. Bonus points if you manage to not retrieve the same user 
several times. (Hint, your schema stores only one language skill per row).


Regards,
Jens

Re: configuring schema to match database

2013-01-14 Thread Gora Mohanty

On 14 January 2013 16:59, Jens Grivolla j+...@grivolla.net wrote:
[...]
 Then please show me the query to find users that are fluent in spanish and
 english. Bonus points if you manage to not retrieve the same user several
 times. (Hint, your schema stores only one language skill per row).

Doh! You are right, of course. Brainfart from my side.

Regards,
Gora

Re: Calculate a sum.

2013-01-14 Thread Edward Garrett

i've had perfectly fine performance with StatsComponent, but have only
tested with 50,000 documents. for example i have field syllables and
numeric field syllables_count. then i sum the syllable count for any
search query. how many documents are you working with?

On Mon, Jan 14, 2013 at 10:54 AM, Mikhail Khludnev
mkhlud...@griddynamics.com wrote:
 Stored fields are famous for its' slowness as well as they requires two io
 operation per doc. You can spend some heap for uninverting the index and
 utilize wiki.apache.org/solr/StatsComponent
 Let us know whether it works for you.
 14.01.2013 13:14 пользователь stockii stock.jo...@googlemail.com
 написал:

 hello.

 My problem is, that i need to calculate a sum of amounts. this amount is in
 my index (stored=true). my php script get all values with paging. but if
 a
 request takes too long, jetty is killing this process and i get a broken
 pipe.

 Which is the best/fastest way to get the values of many fields from index?
 exists an ResponseHandler for exports? Or which is the fastest?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Calculate-a-sum-tp4033091.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
edge

Re: configuring schema to match database

2013-01-14 Thread Jens Grivolla


On 01/14/2013 12:50 PM, Gora Mohanty wrote:

On 14 January 2013 16:59, Jens Grivolla j+...@grivolla.net wrote:
[...]

Then please show me the query to find users that are fluent in spanish and
english. Bonus points if you manage to not retrieve the same user several
times. (Hint, your schema stores only one language skill per row).


Doh! You are right, of course. Brainfart from my side.


Ok, I was starting to wonder if I was the one missing something. 
Re-reading what I wrote I see I may have sounded a bit rude, that was 
not my intention, sorry.


Best,
Jens

Re: How to manage solr cloud collections-sharding?

2013-01-14 Thread Erick Erickson

I can at least answer part of this
see inline.

On Sun, Jan 13, 2013 at 11:44 AM, adfel70 adfe...@gmail.com wrote:

Hi,
I know a few question on this issue have already been posted, but I dint
find full answers in any of those posts.

I'm using solr-4.0.0

[EOE] I'd _really_ start working with a nightly build instead, there have
been a lot of improvements and the RC1 for 4.1 may well be cut this week.

I need my solr cluster to have multiple collections, each collection with
different configuration (at least different schema.xml file).
I follow the solrCloud tutorial page and execute this command:
/java -Dbootstrap_confdir=./solr/collection1/conf
-Dcollection.configName=myconf -DzkRun -DnumShards=5 -jar start.jar/
when I start a solr servers I have collection1 in clustserState.json with
each node assigned to some shard.

questions so far:
1.Is this first command 100% necessary?

[EOE] No, it's not. You could use the zkCli commands here:
http://wiki.apache.org/solr/SolrCloud#Command_Line_Util. See especially the
try bootstrapping all the conf dirs in solr.xml example. But sometime you
have to send all the relevant info to Zookeeper, configuration files etc.
This command is a convenience way to do that.

2. Do I have to defined the number of shards before starting solr
instances?

[EOE] Yes, unless you're doing custom sharding.

3. What if I want to add a shard after I started all solr instances and
haven't indexed yet?

[EOE] Then you have to re-index currently, unless you are doing custom
hashing

4. what if I want to add a shard after indexing?

[EOE] You have to re-index (and reconfigure your ZK state) unless you're
doing custom sharding

5. what is the role that clustserState.json plays? is it just a json file
to
show in the GUI? Or is it the only file that persists the current state of
the cluster?

[EOE] What it does on ZK I don't know, but I've only seen it used as
something for the GUI to read. Actually, all the other views are just
prettifying this file.

6. Can I edit it manually? should I?

[EOE] I've never heard of anyone even wanting to, you'd have to ask
someone who knows way more about ZK than I do.

I add another schema-B.xml file to the zookeeper and open another
collection
by using coreAdmin Rest API.
I want this collection to have 10 shards and not 5 as I defined for the
previous collection.
So I run
/http://server:port
/solr/admin/cores?action=CREATEname=coreXinstanceDir=path_to_instance_directoryconfig=config_file_name.xmlschema=schem_file_name.xmldataDir=datashard=shard//
10 times with different / each run.

[EOE] Currently, by adding the shard= parameter, you're now doing custom
sharding. Mark just raised a JIRA about this recently, don't quite know
what the current status of this is. You're in kind of uncharted territory
here...

questions:
1. is this an appropriate way to use the core admin API? should I specify
the shard Id? I do it because it gives me a way to control the number of
shards (each new shard id creates a new shard). but should I use it this
way?

[EOE] Right, but currently this means you do NOT get automatic distributed
indexing, your indexing program has to send the document to the appropriate
shard, preferably the leader. Like I said, this is kind of new.

2. Can I have different number of shards in different collections on the
same cluster?
3. If yes - then what is the purpose of the first bootstrap command?

[EOE] Well, without some kind of bootstrap, how would _any_ cluster
information ever get to Zookeeper? The first bootstrap command is
essentially the fire an forget approach so you don't have to keep track
of anything to use SolrCloud.

another question:
I saw that in 4.1 version, each shard has another parameter - range. what
is

[EOE] Haven't worked with this yet

this parameter used for? would I have to re-index when upgrading from 4.0
to
4.1?

[EOE] You shouldn't have to re-index

this will help a lot in understanding the whole collection-sharding
architecture in solr cloud.
Thanks

--
View this message in context:
http://lucene.472066.n3.nabble.com/How-to-manage-solr-cloud-collections-sharding-tp4033009.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud sort inconsistency

2013-01-14 Thread Erick Erickson

Unless it's a cut-n-paste error, you don't have an  in front of the sort
parameter so you're not sorting at all. You should have a sort
section in your response where the params are echoed, something like:

params:{
  sort:id desc,
  fl:id,
  cache:False,
  q:id:*,
  rows:10}},


On Sun, Jan 13, 2013 at 6:42 PM, yriveiro yago.rive...@gmail.com wrote:

 How is possible that this sorted query returns different results?

 The highest value is the id P2450024023, sometimes the value returned is
 not
 the highest.

 This is an example, the second curl request is the correct result.

 NOTE: I did the query when a indexing process was running.

 ➜  ~  curl -H Cache-Control: no-cache
 http://192.168.1.241:8983/solr/ST-SHARD_0212/query\?q\=id
 :\*\rows\=10\fl\=id\=sort\=id%20desc\cache\=False
 {
   responseHeader:{
 status:0,
 QTime:5,
 params:{
   cache:False,
   rows:10,
   fl:id=sort=id desc,
   q:id:*}},
   response:{numFound:2387312,start:0,maxScore:1.0,docs:[
   {
 id:P2443605077},
   {
 id:P2443588094},
   {
 id:P2443647855},
   {
 id:P2443613193},
   {
 id:P2443572098},
   {
 id:P2443562507},
   {
 id:P2443643935},
   {
 id:P2443556464},
   {
 id:P2443625267},
   {
 id:P2443580781}]
   }}
 ➜  ~  curl -H Cache-Control: no-cache
 http://192.168.1.241:8983/solr/ST-SHARD_0212/query\?q\=id
 :\*\rows\=10\fl\=id\=sort\=id%20desc\cache\=False
 {
   responseHeader:{
 status:0,
 QTime:4,
 params:{
   cache:False,
   rows:10,
   fl:id=sort=id desc,
   q:id:*}},
   response:{numFound:2387312,start:0,maxScore:1.0,docs:[
   {
 id:P2450024023},
   {
 id:P2450017490},
   {
 id:P2450062568},
   {
 id:P2450053498},
   {
 id:P2449990839},
   {
 id:P2449973572},
   {
 id:P2449957535},
   {
 id:P2450099098},
   {
 id:P2450090195},
   {
 id:P2450072528}]
   }}
 ➜  ~  curl -H Cache-Control: no-cache
 http://192.168.1.241:8983/solr/ST-SHARD_0212/query\?q\=id
 :\*\rows\=10\fl\=id\=sort\=id%20desc\cache\=False
 {
   responseHeader:{
 status:0,
 QTime:6,
 params:{
   cache:False,
   rows:10,
   fl:id=sort=id desc,
   q:id:*}},
   response:{numFound:2387312,start:0,maxScore:1.0,docs:[
   {
 id:P2450024023},
   {
 id:P2450017490},
   {
 id:P2450062568},
   {
 id:P2450053498},
   {
 id:P2449990839},
   {
 id:P2449973572},
   {
 id:P2449957535},
   {
 id:P2450099098},
   {
 id:P2450090195},
   {
 id:P2450072528}]
   }}
 ➜  ~



 -
 Best regards
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SolrCloud-sort-inconsistency-tp4033046.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud sort inconsistency

2013-01-14 Thread Erick Erickson

P.S. of course your sorting won't reflect documents that haven't been
committed yet. So if you straighten out the params your lists should be in
order, but the documents returned may change depending on whether your
indexing process adds docs between calls

Erick


On Mon, Jan 14, 2013 at 7:25 AM, Erick Erickson erickerick...@gmail.comwrote:

 Unless it's a cut-n-paste error, you don't have an  in front of the sort
 parameter so you're not sorting at all. You should have a sort
 section in your response where the params are echoed, something like:

 params:{
   sort:id desc,
   fl:id,
   cache:False,
   q:id:*,
   rows:10}},


 On Sun, Jan 13, 2013 at 6:42 PM, yriveiro yago.rive...@gmail.com wrote:

 How is possible that this sorted query returns different results?

 The highest value is the id P2450024023, sometimes the value returned is
 not
 the highest.

 This is an example, the second curl request is the correct result.

 NOTE: I did the query when a indexing process was running.

 ➜  ~  curl -H Cache-Control: no-cache
 http://192.168.1.241:8983/solr/ST-SHARD_0212/query\?q\=id
 :\*\rows\=10\fl\=id\=sort\=id%20desc\cache\=False
 {
   responseHeader:{
 status:0,
 QTime:5,
 params:{
   cache:False,
   rows:10,
   fl:id=sort=id desc,
   q:id:*}},
   response:{numFound:2387312,start:0,maxScore:1.0,docs:[
   {
 id:P2443605077},
   {
 id:P2443588094},
   {
 id:P2443647855},
   {
 id:P2443613193},
   {
 id:P2443572098},
   {
 id:P2443562507},
   {
 id:P2443643935},
   {
 id:P2443556464},
   {
 id:P2443625267},
   {
 id:P2443580781}]
   }}
 ➜  ~  curl -H Cache-Control: no-cache
 http://192.168.1.241:8983/solr/ST-SHARD_0212/query\?q\=id
 :\*\rows\=10\fl\=id\=sort\=id%20desc\cache\=False
 {
   responseHeader:{
 status:0,
 QTime:4,
 params:{
   cache:False,
   rows:10,
   fl:id=sort=id desc,
   q:id:*}},
   response:{numFound:2387312,start:0,maxScore:1.0,docs:[
   {
 id:P2450024023},
   {
 id:P2450017490},
   {
 id:P2450062568},
   {
 id:P2450053498},
   {
 id:P2449990839},
   {
 id:P2449973572},
   {
 id:P2449957535},
   {
 id:P2450099098},
   {
 id:P2450090195},
   {
 id:P2450072528}]
   }}
 ➜  ~  curl -H Cache-Control: no-cache
 http://192.168.1.241:8983/solr/ST-SHARD_0212/query\?q\=id
 :\*\rows\=10\fl\=id\=sort\=id%20desc\cache\=False
 {
   responseHeader:{
 status:0,
 QTime:6,
 params:{
   cache:False,
   rows:10,
   fl:id=sort=id desc,
   q:id:*}},
   response:{numFound:2387312,start:0,maxScore:1.0,docs:[
   {
 id:P2450024023},
   {
 id:P2450017490},
   {
 id:P2450062568},
   {
 id:P2450053498},
   {
 id:P2449990839},
   {
 id:P2449973572},
   {
 id:P2449957535},
   {
 id:P2450099098},
   {
 id:P2450090195},
   {
 id:P2450072528}]
   }}
 ➜  ~



 -
 Best regards
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SolrCloud-sort-inconsistency-tp4033046.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: theory of sets

2013-01-14 Thread Alexandre Rafalovitch

Does this have to be in Solr?

Given the pre-computed nature of the sub-series, maybe you can encode both
series name and series sort order in a separate structure designed for it.
Something like Neo4J comes to mind: http://www.neo4j.org/ .

Or, this could be a good question for StackOverflow on what handles this
kind of scenarios best.

Regards,
   Alex.
On Mon, Jan 14, 2013 at 5:55 AM, Uwe Reh r...@hebis.uni-frankfurt.de wrote:

 Before anyone asks the XY question.
 The index is intended for a library's catalog and the quest is Find all
 members of a series (eg. penguin books, paperbacks) and order them on their
 sortkey. Unfortunately titles may belong to several (sub)series with
 different sortkeys.

 Still seeking for better approaches.




Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)

Re: Calculate a sum.

2013-01-14 Thread stockii

Mikhail Khludnev wrote
 You can spend some heap for uninverting the index and 
 utilize wiki.apache.org/solr/StatsComponent

what do you mean with this?


Edward Garrett wrote
 how many documents are you working with? 

~90 million documents ...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Calculate-a-sum-tp4033091p4033152.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: regex and highlighter component: highlight and return individual fragments inside a snippet

2013-01-14 Thread Dmitry Kan

it seems, hl.snippets does what I'm after, also discussed here:

http://search-lucene.com/m/yxCOc1X0uY42/highlighting+multiple+occurrencessubj=highlighting+multiple+occurrences

http://wiki.apache.org/solr/*Highlighting*Parameters#hl.snippetshttp://wiki.apache.org/solr/HighlightingParameters#hl.snippets

Dmitry

On Mon, Jan 14, 2013 at 3:14 PM, Dmitry Kan solrexp...@gmail.com wrote:

 Hello!

 I'm playing with the regex feature of highlighting in SOLR. The regex I
 have is pretty simple and, given a keyword query, it hits in a few places
 inside each document.
 Is there a way of highlighting and returning individual fragments with
 this approach? That is, if the matches are found let's say in the
 beginning, middle and end parts of the document, then return only these
 three groups inside a highlighter snippet?

 If there isn't such a functionality, do you know, what would it take to
 implement one (solr/lucene 3.4)?

 Regards,

 Dmitry Kan

How to use shardId

2013-01-14 Thread starbuck

Hi all,

I am trying to realize a solr cloud cluster with 2 collections and 4 shards
each with 2 replicates hosted by 4 solr instances. If shardNum parm is set
to 4 and all solr instances are started after each other it seems to work
fine.

What I wanted to do now is removing shardNum from JAVA_OPTS and defining
each core with a shardId. Here is my current solr.xml of the first and
second (in the second there is another instanceDir, the rest is the same)
solr instance:



Here is solr.xml of the third and fourth solr instance: 



But it seems that solr doesn't accept the shardId or omits it. What I really
get is 2 collections each with 2 shards and 8 replicates (each solr instance
2)
Either the functionality is not really clear to me or there has to be a
config failure.

It would very helpful if anyone could give me a hint.

Thanks.
starbuck





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-use-shardId-tp4033186.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: SOlr 3.5 and sharding

2013-01-14 Thread Michael Ryan

If you have the same documents -- with the same uniqueKey -- across multiple 
shards, the count will not be what you expect. You'll need to ensure that each 
document exists only on a single shard.

-Michael

-Original Message-
From: Jean-Sebastien Vachon [mailto:jean-sebastien.vac...@wantedanalytics.com] 
Sent: Monday, January 14, 2013 9:59 AM
To: solr-user@lucene.apache.org
Subject: SOlr 3.5 and sharding 

Hi,

I`m setting up a small Sorl setup consisting of 1 master node and 4 shards. For 
now, all four shards contains the exact same data. When I perform a query on 
each individual shards for the word `java` I am receiving the same number of 
docs (as expected). However, when I am going through the master node using the 
shards parameters, the number of results is slightly off by a few documents. 
There is nothing special in my setup so I`m looking for hints on why I am 
getting this problem

Thanks

Re: RSS tutorial that comes with the apache-solr not indexing

2013-01-14 Thread Lance Norskog

This example may be out of date, if the RSS feeds from Slashdot have 
changed. If you know XML and XPaths, try this:
Find an rss feed from somewhere that works. Compare the xpaths in it 
v.s. the xpaths in the DIH script.


On 01/13/2013 07:38 PM, bibhor wrote:

Hi
I am trying to use the RSS tutorial that comes with the apache-solr.
I am not sure if I missed anything but when I do full-import no indexing
happens.
These are the steps that I am taking:

1) Download apache-solr-3.6.2 (http://lucene.apache.org/solr/)
2) Start the solr by doing: java -Dsolr.solr.home=./example-DIH/solr/ -jar
start.jar
3) Goto url:
http://192.168.1.12:8983/solr/rss/dataimport?command=full-import
4) When I do this it says: Indexing completed. Added/Updated: 0 documents.
Deleted 0 documents.

Now I know that the default example is getting the RSS from:
http://rss.slashdot.org/Slashdot/slashdot
This default example is empty when I view it in chrome. It does have XML
data in the source but I am not sure if this has anything to do with the
import failure.
  
I also modified the rss-config so that I can test other RSS sources. I used

http://www.feedforall.com/sample.xml and updated the rss-config.xml but this
did the same and did not Add/Update any documents.
Any help is appreciated.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/RSS-tutorial-that-comes-with-the-apache-solr-not-indexing-tp4033067.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Schema Field Names i18n

2013-01-14 Thread Lance Norskog

Will a field have different names in different languages? There is no 
facility for 'aliases' for field name. Erick is right, this sounds like 
you need query and update components to implement this. Also, you might 
try using URL-encoding for the field names. This would save my sanity.


On 01/10/2013 04:56 AM, Erick Erickson wrote:

There's no really easy way that I know of. I've seen several approaches
used though

1 do it in the UI. This assumes that your users aren't typing in raw
queries, they're picking field names from a drop-down or similar. Then the
UI maps the chosen fields into what the schema defines.

2 Do it in the middleware when assembling the query to pass through. Be
careful with the translations though, there always seem to be edge cases.

3 What you're suggesting. Unless you're really fluent in parsers (they
give me indigestion)  I'd think about a query component.

Best
Erick


On Wed, Jan 9, 2013 at 7:36 PM, Daryl Robbins daryl.robb...@mxi.com wrote:


Anyone have experience with internationalizing the field names in the SOLR
schema, so users in different languages can specify fields in their own
language? My first thoughts would be to create a custom search component or
query parser than would convert localized field names back to the English
names in the schema, but I haven't dived in too deep yet. Any input would
be greatly appreciated.

Thanks,

Daryl



__
* This message is intended only for the use of the individual or entity to
which it is addressed, and may contain information that is privileged,
confidential and exempt from disclosure under applicable law. Unless you
are the addressee (or authorized to receive for the addressee), you may not
use, copy or disclose the message or any information contained in the
message. If you have received this message in error, please advise the
sender by reply e-mail, and delete the message, or call +1-613-747-4698. *

Re: Solr 4.0 SnapPuller version vs. generation issue

2013-01-14 Thread Mark Miller

I've fixed this - thanks Gregg.

https://issues.apache.org/jira/browse/SOLR-4303

- Mark

On Jan 10, 2013, at 5:41 PM, Mark Miller markrmil...@gmail.com wrote:

Hmm…I don't recall that change. We use the force, so SolrCloud certainly does
not depend on it.

It seems like it might be a mistake - some dev code that got caught up with
the commit?

I'm a little surprised it wouldn't trip any tests…I still have to read your
first email closely though.

- Mark

On Jan 10, 2013, at 4:49 PM, Gregg Donovan gregg...@gmail.com wrote:

Thanks, Mark.

The relevant commit on the solrcloud branch appears to be 1231134 and is
focused on the recovery aspect of SolrCloud:

http://svn.apache.org/viewvc?diff_format=hview=revisionrevision=1231134
http://svn.apache.org/viewvc/lucene/dev/branches/solrcloud/solr/core/src/java/org/apache/solr/handler/SnapPuller.java?diff_format=hr1=1231133r2=1231134;

I tried changed the check on our 4.0 test cluster to:

boolean isFullCopyNeeded =
IndexDeletionPolicyWrapper.getCommitTimestamp(commit) = latestVersion
|| commit.getGeneration() = latestGeneration || forceReplication;

and that fixed our post-reindexing HTTP replication issues. But I'm not
sure if that check works for all of the cases that SnapPuller is designed
for.

--Gregg

On Thu, Jan 10, 2013 at 4:28 PM, Mark Miller markrmil...@gmail.com wrote:

On Jan 10, 2013, at 4:11 PM, Gregg Donovan gregg...@gmail.com wrote:

If the commitTimeMSec based check in Solr 4.0 is needed for SolrCloud,

It's not. SolrCloud just uses the force option. I think this other change
was made because Lucene stopped using both generation and version. I can
try and look closer later - can't remember who made the change in Solr.

- Mark

Re: RSS tutorial that comes with the apache-solr not indexing

2013-01-14 Thread bibhor

Hi,
I did try another RSS from here http://www.feedforall.com/sample.xml; but
it also didnt work and came back with same message saying it indexed 0
documents.

This is my data from rss-data-config.xml

dataConfig
dataSource type=URLDataSource /
document
entity name=rsstest
pk=link
url=http://www.feedforall.com/sample.xml;
processor=XPathEntityProcessor
forEach=/RDF/channel | /RDF/item
transformer=DateFormatTransformer

field column=title xpath=/RDF/item/title /
field column=link xpath=/RDF/item/link /
/entity
/document
/dataConfig







--
View this message in context: 
http://lucene.472066.n3.nabble.com/RSS-tutorial-that-comes-with-the-apache-solr-not-indexing-tp4033067p4033230.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: RSS tutorial that comes with the apache-solr not indexing

2013-01-14 Thread Steve Rowe

Hi bibhor,

I looked at http://rss.slashdot.org/Slashdot/slashdot and 
http://www.feedforall.com/sample.xml, and their top-level structure is:

rss
  channel
...

This doesn't match your entity ... forEach=/RDF/channel ...  or your field 
column=... xpath=/RDF/item/... /

Steve

On Jan 14, 2013, at 1:02 PM, bibhor bib...@gmail.com wrote:

 Hi,
 I did try another RSS from here http://www.feedforall.com/sample.xml; but
 it also didnt work and came back with same message saying it indexed 0
 documents.
 
 This is my data from rss-data-config.xml
 
 dataConfig
dataSource type=URLDataSource /
document
entity name=rsstest
pk=link
url=http://www.feedforall.com/sample.xml;
processor=XPathEntityProcessor
forEach=/RDF/channel | /RDF/item
transformer=DateFormatTransformer
 
field column=title xpath=/RDF/item/title /
field column=link xpath=/RDF/item/link /
/entity
/document
 /dataConfig
 
 
 
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/RSS-tutorial-that-comes-with-the-apache-solr-not-indexing-tp4033067p4033230.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: configuring schema to match database

2013-01-14 Thread Gora Mohanty

On 14 January 2013 17:28, Jens Grivolla j+...@grivolla.net wrote:
 On 01/14/2013 12:50 PM, Gora Mohanty wrote:
[...]
 Doh! You are right, of course. Brainfart from my side.


 Ok, I was starting to wonder if I was the one missing something. Re-reading
 what I wrote I see I may have sounded a bit rude, that was not my intention,
 sorry.

Did not take it as rude, and in any case am willing to
tolerate a lot of impoliteness when someone shows me
that I was wrong.

Must have been half-asleep when I wrote my original
reply, and was then trying to defend it. At least that's
my story, and I am sticking to it :-)

Regards,
Gora

Re: RSS tutorial that comes with the apache-solr not indexing

2013-01-14 Thread bibhor

Hi Steve
Thank you for your help. After I updated the rss-data-config to following,
it worked.

dataConfig
dataSource type=URLDataSource /
document
entity name=rsstest
pk=link
url=http://www.feedforall.com/sample.xml;
processor=XPathEntityProcessor
forEach=/rss/channel/item
transformer=DateFormatTransformer

field column=title xpath=/rss/channel/item/title /
field column=link xpath=/rss/channel/item/link /
/entity
/document
/dataConfig



--
View this message in context: 
http://lucene.472066.n3.nabble.com/RSS-tutorial-that-comes-with-the-apache-solr-not-indexing-tp4033067p4033254.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: performance improvements on ip look up query

2013-01-14 Thread Mikhail Khludnev

Hello Lee,

I suppose caching isn't efficient for this type of searches. I can propose
a kind of trick.
if you index your docs in order by (STARTIP,ENDIP) tuple, it should make
intersection faster. However it's rather theoretical consideration than
practical one.

More real one is to encode ip range in a single field and search by special
query. I'm aware of geospatial approach for this
lucene.472066.n3.nabble.com/Modeling-openinghours-using-multipoints-td4025336.html
Unfortunately I never did it.
On Wed, Jan 9, 2013 at 8:13 PM, Lee Carroll lee.a.carr...@googlemail.comwrote:

 Hi Otis

 The cache was modest 4096 with a hit rate of 0.23 after a 24hr period.
 We doubled it and the hit rate went to 0.25. Our interpretation is ip is
 pretty much a cache busting value ? and
 cache size is not at play here.

 the q param is just startIpNum:[* TO 180891652]AND endIpNum:[180891652 TO
 *] so again our
 interpretation is its got little reuse

 Could we re-formulate the query to be more per-formant ?


 On 9 January 2013 12:56, Otis Gospodnetic otis.gospodne...@gmail.com
 wrote:

  Hi,
 
  Maybe your cache is too small?  How big is it and does the hit rate
 change
  if you make it bigger?
 
  Do any parts of the query repeat a lot? Maybe there is room for fq.
 
  Otis
  Solr  ElasticSearch Support
  http://sematext.com/
  On Jan 9, 2013 6:08 AM, Lee Carroll lee.a.carr...@googlemail.com
  wrote:
 
   Hi
  
   We are doing a lat/lon look up query using ip address.
  
   We have a 6.5 million document core of the following structure
   start ip block
   end ip block
   location id
   location_lat_lon
  
   the field defs are
   types
   fieldType name=string class=solr.StrField sortMissingLast=true
   omitNorms=true/
   fieldType name=tlong class=solr.TrieLongField precisionStep=8
   omitNorms=true positionIncrementGap=0/
   fieldType name=tfloat class=solr.TrieFloatField precisionStep=8
   omitNorms=true positionIncrementGap=0/
   fieldType name=tdouble class=solr.TrieDoubleField
 precisionStep=8
   omitNorms=true positionIncrementGap=0/
   fieldType name=location class=solr.LatLonType subFieldSuffix=
   _coordinate/
   /types
   fields
   field name=startIp type=string indexed=true stored=false
   required=
   true/
   field name=startIpNum type=tlong indexed=true stored=false
   required
   =true/
   field name=endIpNum type=tlong indexed=true stored=false
   required=
   true/
   field name=locId type=string indexed=true stored=true
 required=
   true/
   field name=countryCode type=string indexed=true stored=true
   required=false/
   field name=cityName type=string indexed=false stored=true
  required
   =false/
   field name=latLon type=location indexed=true stored=true
   required=
   true/
   field name=latitude type=string indexed=false stored=true
  required
   =true/
   field name=longitude type=string indexed=false stored=true
   required
   =true/
   dynamicField name=*_coordinate type=tdouble indexed=true
 stored=
   false/
   /fields
  
   the query at the moment is simply a range query
  
   q=startIpNum:[* TO 180891652]AND endIpNum:[180891652 TO *]
  
   we are seeing a full query cache with a low hit rate 0.2 and a high
   eviction rate which makes sense given the use of ip address in the
 query.
  
   query time mean is 120.
  
   Is their a better way of structuring the core for this usecase ?
   I suspect our heap memory settings are conservative 1g but will need to
   convince our sys admins to change this (they are not ringing any
 resource
   alarm bells) just the query is a little slow
  
 




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com

Velocity in Multicore

2013-01-14 Thread Ramirez, Paul M (388J)

Hi,

I've been unable to get the velocity response writer to work in a multicore 
environment. Working from the examples that are distributed with Solr I simply 
started from the multicore example and added a hello.vm into 
core0/conf/velocity directory. I then updated the solrconfig.xml to add a new 
request handler as shown below. I've tried to use the v.base_dir to no success. 
Essentially what I always end up with is the default solr response. Has anyone 
been able to get the velocity response writer to work in a multicore 
environment? If so, could you point me to the documentation on how to do so.

hello.vm

Hello World!

solrconfig.xml
===
…
 requestHandler name=/hello class=solr.SearchHandler
 lst name=defaults
   str name=echoParamsexplicit/str
   !-- VelocityResponseWriter settings --
   str name=wtvelocity/str
   str name=v.templatehello/str
   !-- I've tried all the following in addition to not specifying any. --
   !--str name=v.base_dircore0/conf/velocity/str--
   !--str name=v.base_dirconf/velocity/str--
   !--str name=v.base_dirmulticore/core0/conf/velocity/str--
 /lst

  /requestHandler
…



Regards,
Paul Ramirez

Re: POST query with non-ASCII to solr using httpclient wont work

2013-01-14 Thread Jie Sun

unfortunately solrj is not an option here...
we will have to make a quick fix with a patch out in production.

I am still unable to make the solr (3.5) take url encoded query. again
passing non-urlencoded query string works with non-ASIIC (Chinese), but
fails return anything when sending request with urlencoded + Chinese.

any suggestion?
thanks
jie



--
View this message in context: 
http://lucene.472066.n3.nabble.com/POST-query-with-non-ASCII-to-solr-using-httpclient-wont-work-tp4032957p4033262.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: RSS tutorial that comes with the apache-solr not indexing

2013-01-14 Thread Gora Mohanty

On 15 January 2013 00:38, Alexandre Rafalovitch arafa...@gmail.com wrote:
 Is that something that needs to be updated in the example schema as well
 then?

The example rss-data-config.xml references
http://rss.slashdot.org/Slashdot/slashdot which
seems to be broken at the moment, at least
for me. This is the same URL as on the Slashdot
home page, so not sure what is going on.

Regards,
Gora

Re: theory of sets

2013-01-14 Thread Mikhail Khludnev

My answer as usual - BlockJoin.
index group as a parent document, and every membership as a child doc. In
this case you somehow denormalize your items - every item will be indexed
N times where N is a number of groups it belongs to. Potentially it can
lead to duplication problem, but you haven't mention it yet.

In this case single field for sorting is enough.

then you can search by ToChildQuery(GRP_NAME:foo) than gives you foo's
children.

BJQ-101:
http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
http://blog.mikemccandless.com/2012/01/tochildblockjoinquery-in-lucene.html
https://issues.apache.org/jira/browse/SOLR-3076

forgot to mention there is a price for blazing speed, when you need to
change membership you need to reindex whole block.


On Fri, Jan 4, 2013 at 1:10 AM, Uwe Reh r...@hebis.uni-frankfurt.de wrote:

 Hi,

 I'm looking for a tricky solution of a common problem. I have to handle a
 lot of items and each could be member of several groups.
 - OK, just add a field called 'member_of'

 No that's not enough, because each group is sorted and each member has a
 sortstring for this group.
 - OK, still easy add a dynamic field 'sortinfo_for_*' and fill this for
 each group membership.

 Yes, this works, but there are thousands of different groups, that much
 dynamic fields are probably a serious performance issue.
 - Well ...

 I'm looking for a smart way to answer to the question Find the members of
 group X and sort them by the the sortstring for this group.

 One idea I had was to fill the 'member_of' field with composed entrys
 (groupname + _ + sortstring). Finding the members is easy with wildcards
 but there seems to be no way to use the sortstring as a boostfactor

 Has anybody solved this problem?
 Any hints are welcome.

 Uwe




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com

Re: Search across a specified number of boundaries

2013-01-14 Thread Mikhail Khludnev

Mike,

When Lucene's Analyser indexes the text it adds positions into the index
which are lately used by SpanQueries. Have you considered idea of position
increment gap? e.g. the first sentence is indexed with words positions:
0,1,2,3,... the second sentence with 100,101,102,103,..., third
200,201,202.. Then applying some span constraint allows you search
across/inside of the sentences.
WDYT?


On Sun, Jan 6, 2013 at 6:50 PM, Erick Erickson erickerick...@gmail.comwrote:

 Mike:

 I'm _really_ stretching here, but you might be able to do something
 interesting
  with payloads. Say each word had a payload with the sentence number and
 you _somehow_ made use of that information in a custom scorer. But like I
 said, I really have no good idea how to accomplish that...

 BTW, in future this kind of question is better asked on the user's list
 (either
 Lucene or Solr), this list if intended for discussing development work

 Best
 Erick


 On Fri, Jan 4, 2013 at 1:02 PM, Mike Ree mike.ad...@olytech.net wrote:

 d terms that are in nearby sentences.

 IE:
 TermA NEAR3 TermB would find all TermA's that are within 3 sentences of
 TermB.

 Have found ways to find TermA within same sentence





-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com

Re: Question about GC logging timestamps

2013-01-14 Thread Mikhail Khludnev

Shawn,
you are welcome!

http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html
The flag -XX:+PrintGCTimeStamps will additionally print a time stamp at the
start of each collection.



On Sun, Jan 6, 2013 at 6:54 AM, Michael Ryan mr...@moreover.com wrote:

 From my own experience, the timestamp seems to be logged at the start of
 the garbage collection.

 -Michael




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com

Re: Index data from multiple tables into Solr

2013-01-14 Thread Lance Norskog

Try all of the links under the collection name in the lower left-hand
columns. There several administration monitoring tools you may find useful.

On 01/14/2013 11:45 AM, hassancrowdc wrote:

ok stats are changing, so the data is indexed. But how can i do query with
this data, or ow can i search it, like the command will be
http://localhost:8983/solr/select?q=(any of my field column from table)?
coz whatever i am putting in my url it shows me an xml file but the
numFound are always 0?

On Sat, Jan 12, 2013 at 1:24 PM, Alexandre Rafalovitch [via Lucene]
ml-node+s472066n4032778...@n3.nabble.com wrote:

Have you tried the Admin interface yet? The one on :8983 port if you are
running default setup. That has a bunch of different stats you can look at
apart from a nice way of doing a query. I am assuming you are on Solr 4,
of
course.

Regards,
Alex.

On Fri, Jan 11, 2013 at 5:13 PM, hassancrowdc [hidden
email]http://user/SendEmail.jtp?type=nodenode=4032778i=0wrote:

So, I followed all the steps and solr is working successfully, Can you
please tell me how i can see if my data is indexed or not? do i have to
enter specific url into my browser or anything. I want to make sure that
the data is indexed.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)

--
If you reply to this email, your message will be added to the discussion
below:

http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4032778.html
To unsubscribe from Index data from multiple tables into Solr, click
herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4032266code=aGFzc2FuY3Jvd2RjYXJlQGdtYWlsLmNvbXw0MDMyMjY2fC00ODMwNzMyOTM=
.
NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml

--
View this message in context:
http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033268.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: POST query with non-ASCII to solr using httpclient wont work

2013-01-14 Thread Uwe Reh


Hi Jie,

maybe there is a simple solution. When we used tomcat as servlet 
container for solr I notices similar problems. Even with the hints from 
the solr wiki about unicode and Tomcat, i wasn't able to fix this.
So we switched back to Jetty, querys like q=allfields2%3A能力 are 
reliable now.


Uwe

BTW: I have no idea for at all what these Japanese signs mean. So just 
let me append two of 31 hits in our bibliographic catalog



doc
  str name=idHEB052032124/str
  str name=raw_fullrecordalg: 5203212
001@ $0205
001A $4:13-05-97
001B $t13:12:07.000$01999:10-06-10
001D $0:99-99-99
001U $0utf8
001X $00
002@ $0Aau
003@ $0052032124
007I $0NacsisBN09679884
010@ $ajpn
011@ $a1993
013H $0z
019@ $ajp
021A $ULatn$T01$aNōryoku kaihatsu no shisutemu$hYaguchi Hajime
021A $UJpan$T01$a@能力開発のシステム$h矢口新著
028A $ULatn$T01$9165745363$8Yaguchi, Hajime
028A $UJpan$T01$d新$a矢口
033A $ULatn$T01$pTokyo$nNōryoku Kaihatsu Kōgaku Sentaa
033A $UJpan$T01$p東久留米$n能力開発工学センター
034D $a274 S.
034M $aIll.
036E $aYaguchi Hajime senshū$l2
036F $l2$9052031527$8Yaguchi Hajime senshū$x12
037B $aSysteme zur Entwicklung der Fähigkeiten
046L $aIn japan. Schr.
...
247C/01 $9102595631$8351457-2 4/457Marburg, Universität Marburg, Bibliothek 
des Japan-Zentrums (BJZ)
  /str
/doc
doc
  str name=idHEB286840723/str
  str name=raw_fullrecordalg: 28684072
001@ $03
001A $00030:04-01-12
001B $t22:29:11.000$01999:04-01-12
001C $t10:48:47.000$00030:04-01-12
001D $00030:04-01-12
001U $0utf8
001X $00
002@ $0Aau
003@ $0286840723
004A $A978-4-88319-546-6
007A $0286840723$aHEB
010@ $ajpn
011@ $a2010
021A $ULatn$T01$aShin kanzen masutā kanji nihongo nōryoku shiken ; N1$hIshii 
Reiko ...
021A $UJpan$T01$a新完全マスター漢字日本語能力試験 ; N1$h石井怜子 [ほか] 著
027A $ULatn$T01$aShin kanzen masutā kanji : nihongo nōryoku shiken ; enu ichi / 
Ishii Reiko ...
027A $UJpan$T01$a新完全マスター漢字 : 日本語能力試験 ; N1 / 石井怜子 [ほか] 著
028C $9230917593$8Ishii, Reiko
033A $ULatn$T01$pTōkyō$nSurīē nettowāku
033A $UJpan$T01$p東京$nスリーエーネットワーク
034D $aviii, 197, 21S.
034I $a26cm
044A $S4$aNihongokyōiku(Taigaikokujin)
045Z $aEI 4650
...
247C/01 $9102599157$8601220-6 30/220Frankfurt, Universität Frankfurt, 
Institut für Orientalische und Ostasiatische Philologien, Japanologie
  /str
/doc

Re: Index data from multiple tables into Solr

2013-01-14 Thread hassancrowdc

thanx, I got it.

How Can i integrate solr with my website? so that i can use it for search?

On Mon, Jan 14, 2013 at 4:04 PM, Lance Norskog-2 [via Lucene]
ml-node+s472066n4033291...@n3.nabble.com wrote:

Try all of the links under the collection name in the lower left-hand
columns. There several administration monitoring tools you may find
useful.

On 01/14/2013 11:45 AM, hassancrowdc wrote:

ok stats are changing, so the data is indexed. But how can i do query
with
this data, or ow can i search it, like the command will be
http://localhost:8983/solr/select?q=(any of my field column from
table)?
coz whatever i am putting in my url it shows me an xml file but the
numFound are always 0?

On Sat, Jan 12, 2013 at 1:24 PM, Alexandre Rafalovitch [via Lucene]
[hidden email] http://user/SendEmail.jtp?type=nodenode=4033291i=0
wrote:

Have you tried the Admin interface yet? The one on :8983 port if you
are
running default setup. That has a bunch of different stats you can look
at
apart from a nice way of doing a query. I am assuming you are on Solr
4,
of
course.

Regards,
Alex.

On Fri, Jan 11, 2013 at 5:13 PM, hassancrowdc [hidden email]
http://user/SendEmail.jtp?type=nodenode=4032778i=0wrote:

So, I followed all the steps and solr is working successfully, Can you
please tell me how i can see if my data is indexed or not? do i have
to
enter specific url into my browser or anything. I want to make sure
that
the data is indexed.

--
If you reply to this email, your message will be added to the
discussion
below:

.
NAML
http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml

--
View this message in context:
http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033268.html

Sent from the Solr - User mailing list archive at Nabble.com.

--
If you reply to this email, your message will be added to the discussion
below:

http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033291.html
To unsubscribe from Index data from multiple tables into Solr, click
herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4032266code=aGFzc2FuY3Jvd2RjYXJlQGdtYWlsLmNvbXw0MDMyMjY2fC00ODMwNzMyOTM=
.
NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml

--
View this message in context:
http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033296.html
Sent from the Solr - User mailing list archive at Nabble.com.

SolrCloud :: Adding replica :: Sync-up issue

2013-01-14 Thread Mishkin, Ernest

Hello,

I observed a rather weird issue with SolrCloud.

Using Solr 4.0 GA code.

Started with a 3-node Zookeeper ensemble (standalone) and a single Solr 
instance running single collection. numShards was set to 1 during collection 
creation (don't want sharding, just replication).
Everything worked fine.

Started another Solr instance for the same collection. Properly went through 
the steps realizing it needed to sync up (actual url values replaced with 
url):

12:50:59.152 INFO  [o.apache.solr.cloud.RecoveryStrategy] Starting recovery 
process.  core=users recoveringAfterStartup=true [RecoveryThread]
12:50:59.152 INFO  [o.a.solr.servlet.SolrDispatchFilter ] user.dir=/home/seg 
[localhost-startStop-1]
12:50:59.153 INFO  [o.a.solr.servlet.SolrDispatchFilter ] 
SolrDispatchFilter.init() done [localhost-startStop-1]
12:50:59.189 INFO  [o.apache.solr.cloud.RecoveryStrategy] ## 
startupVersions=[] [RecoveryThread]
12:50:59.198 INFO  [o.apache.solr.cloud.RecoveryStrategy] Attempting to 
PeerSync from url core=users - recoveringAfterStartup=true [RecoveryThread]
12:50:59.201 INFO  [o.a.s.c.solrj.impl.HttpClientUtil   ] Creating new http 
client, 
config:maxConnectionsPerHost=20maxConnections=1connTimeout=3socketTimeout=3retry=false
 [RecoveryThread]
12:50:59.377 INFO  [org.apache.solr.update.PeerSync ] PeerSync: core=users 
url=url START replicas=[url] nUpdates=100 [RecoveryThread]
12:50:59.377 DEBUG [org.apache.solr.update.PeerSync ] PeerSync: core=users 
url=urlsolr startingVersions=0 [] [RecoveryThread]
12:50:59.390 WARN  [org.apache.solr.update.PeerSync ] no frame of reference 
to tell of we've missed updates [RecoveryThread]
12:50:59.390 INFO  [o.apache.solr.cloud.RecoveryStrategy] PeerSync Recovery was 
not successful - trying replication. core=users [RecoveryThread]
12:50:59.390 INFO  [o.apache.solr.cloud.RecoveryStrategy] Starting Replication 
Recovery. core=users [RecoveryThread]
12:50:59.422 INFO  [o.a.solr.common.cloud.ZkStateReader ] A cluster state 
change has occurred - updating... [localhost-startStop-1-EventThread]
12:50:59.575 INFO  [o.a.s.c.solrj.impl.HttpClientUtil   ] Creating new http 
client, 
config:maxConnections=128maxConnectionsPerHost=32followRedirects=false 
[RecoveryThread]
12:51:02.742 INFO  [o.apache.solr.cloud.RecoveryStrategy] Begin buffering 
updates. core=users [RecoveryThread]
12:51:02.742 INFO  [org.apache.solr.update.UpdateLog] Starting to buffer 
updates. FSUpdateLog{state=ACTIVE, tlog=null} [RecoveryThread]
12:51:02.743 INFO  [o.apache.solr.cloud.RecoveryStrategy] Attempting to 
replicate from url. core=users [RecoveryThread]
12:51:02.743 INFO  [o.a.s.c.solrj.impl.HttpClientUtil   ] Creating new http 
client, 
config:maxConnections=128maxConnectionsPerHost=32followRedirects=false 
[RecoveryThread]
12:51:02.762 INFO  [o.a.s.c.solrj.impl.HttpClientUtil   ] Creating new http 
client, 
config:connTimeout=5000socketTimeout=2allowCompression=falsemaxConnections=1maxConnectionsPerHost=1
 [RecoveryThread]
12:51:02.774 INFO  [org.apache.solr.handler.SnapPuller  ]  No value set for 
'pollInterval'. Timer Task not started. [RecoveryThread]
12:51:02.781 INFO  [org.apache.solr.core.SolrCore   ] 
SolrDeletionPolicy.onInit: commits:num=1

commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@/solr/users/data/index
 lockFactory=org.apache.lucene.store.NativeFSLockFactory@6e28575; 
maxCacheMB=48.0 
maxMergeSizeMB=4.0),segFN=segments_1,generation=1,filenames=[segments_1] 
[RecoveryThread]
12:51:02.782 INFO  [org.apache.solr.core.SolrCore   ] newest commit = 1 
[RecoveryThread]
12:51:02.782 DEBUG [o.apache.solr.update.SolrIndexWriter] Opened Writer 
DirectUpdateHandler2 [RecoveryThread]
12:51:02.784 INFO  [org.apache.solr.update.UpdateHandler] start 
commit{flags=0,_version_=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false}
 [RecoveryThread]
12:51:02.785 DEBUG [org.apache.solr.update.UpdateLog] TLOG: preCommit 
[RecoveryThread]
12:51:02.823 INFO  [org.apache.solr.core.SolrCore   ] 
SolrDeletionPolicy.onCommit: commits:num=2

commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@/solr/users/data/index
 lockFactory=org.apache.lucene.store.NativeFSLockFactory@6e28575; 
maxCacheMB=48.0 
maxMergeSizeMB=4.0),segFN=segments_1,generation=1,filenames=[segments_1]

commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@/solr/users/data/index
 lockFactory=org.apache.lucene.store.NativeFSLockFactory@6e28575; 
maxCacheMB=48.0 
maxMergeSizeMB=4.0),segFN=segments_2,generation=2,filenames=[segments_2] 
[RecoveryThread]
12:51:02.824 INFO  [org.apache.solr.core.SolrCore   ] newest commit = 2 
[RecoveryThread]
12:51:02.828 INFO  [o.a.solr.search.SolrIndexSearcher   ] Opening 
Searcher@5947fe65 main [RecoveryThread]

12:51:02.837 DEBUG [org.apache.solr.update.UpdateLog] TLOG: postCommit 
[RecoveryThread]
12:51:02.837 INFO

Need another way to boost relavence of recent content

2013-01-14 Thread Shawn Heisey

I implemented the date boosting function outline here, placed into the 
boost parameter in the request handler:


http://wiki.apache.org/solr/FunctionQuery#Date_Boosting

Today it was reported to me that this boosting is producing terrible 
results.  A close look at the description reveals that this isn't so 
much a boost on new content as it is a negative boost on old content. 
This isn't what I want.


What I need to have happen is that content added today gets a small (and 
easily configurable) boost, content added yesterday gets a slightly 
smaller boost, tapering down to approximately two to four weeks in the 
past, at which point there would be no boost at all.  A document that's 
a month old would have the same boost (none) as a document that's 50 
years old.


Can someone figure out the formula for this?

Thanks,
Shawn

RE: Need another way to boost relavence of recent content

2013-01-14 Thread Markus Jelsma

Hi,

Depending on use case the functions max, min, scale and map can be used really 
well to regulate the output of recip. Check their docs and you'll surely work 
it out. Perhaps scale will work best for you.

Cheers
 
-Original message-
 From:Shawn Heisey s...@elyograg.org
 Sent: Mon 14-Jan-2013 22:50
 To: solr-user@lucene.apache.org
 Subject: Need another way to boost relavence of recent content
 
 I implemented the date boosting function outline here, placed into the 
 boost parameter in the request handler:
 
 http://wiki.apache.org/solr/FunctionQuery#Date_Boosting
 
 Today it was reported to me that this boosting is producing terrible 
 results.  A close look at the description reveals that this isn't so 
 much a boost on new content as it is a negative boost on old content. 
 This isn't what I want.
 
 What I need to have happen is that content added today gets a small (and 
 easily configurable) boost, content added yesterday gets a slightly 
 smaller boost, tapering down to approximately two to four weeks in the 
 past, at which point there would be no boost at all.  A document that's 
 a month old would have the same boost (none) as a document that's 50 
 years old.
 
 Can someone figure out the formula for this?
 
 Thanks,
 Shawn

RE: Need another way to boost relavence of recent content

2013-01-14 Thread Markus Jelsma

Hi,

Depending on use case the functions max, min, scale and map can be used really 
well to regulate the output of recip. Check their docs and you'll surely work 
it out. Perhaps scale will work best for you.

Cheers

 
 
-Original message-
 From:Shawn Heisey s...@elyograg.org
 Sent: Mon 14-Jan-2013 22:50
 To: solr-user@lucene.apache.org
 Subject: Need another way to boost relavence of recent content
 
 I implemented the date boosting function outline here, placed into the 
 boost parameter in the request handler:
 
 http://wiki.apache.org/solr/FunctionQuery#Date_Boosting
 
 Today it was reported to me that this boosting is producing terrible 
 results.  A close look at the description reveals that this isn't so 
 much a boost on new content as it is a negative boost on old content. 
 This isn't what I want.
 
 What I need to have happen is that content added today gets a small (and 
 easily configurable) boost, content added yesterday gets a slightly 
 smaller boost, tapering down to approximately two to four weeks in the 
 past, at which point there would be no boost at all.  A document that's 
 a month old would have the same boost (none) as a document that's 50 
 years old.
 
 Can someone figure out the formula for this?
 
 Thanks,
 Shawn

I/O exception (java.net.SocketException) caught when processing request: Connection reset

2013-01-14 Thread Joe

I have a multi-threaded application in solrj 4. The threads (max 25) share
one connection to HttpSolrServer. Each thread is running one query. This
worked fine for a while, until it finally crashed with the following
messages: 

Jan 12, 2013 12:52:15 PM org.apache.http.impl.client.DefaultRequestDirector
tryExecute
INFO: Retrying request
Jan 12, 2013 12:52:15 PM org.apache.http.impl.client.DefaultRequestDirector
tryExecute
INFO: I/O exception (java.net.SocketException) caught when processing
request: Connection reset

I would like to catch this exception and reset the connection to the server.
I don't get the whole stack trace with the above message, so I'm not sure
where to do this. The only place I reference the server is for making a
query with: 

server.query( query )

But the only exception this throws is SolrServerException, which I'm
currently handling. 

Any suggestions would be greatly appreciated. 

FYI, this is how I'm setting up the initial server connection:

server = new HttpSolrServer(url);
server.setSoTimeout(0);
server.setDefaultMaxConnectionsPerHost(50);
server.setMaxTotalConnections(128);




--
View this message in context: 
http://lucene.472066.n3.nabble.com/I-O-exception-java-net-SocketException-caught-when-processing-request-Connection-reset-tp4033309.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Need another way to boost relavence of recent content

2013-01-14 Thread Shawn Heisey


On 1/14/2013 2:56 PM, Markus Jelsma wrote:

Depending on use case the functions max, min, scale and map can be used really 
well to regulate the output of recip. Check their docs and you'll surely work 
it out. Perhaps scale will work best for you.


I need someone to sanity check my work here.

Here's my existing boost:

str name=boostrecip(ms(NOW/DAY,pd),3.16e-11,1,1)/str

After a careful look at your advice and the functions available, this is 
what I have come up with:


min(recip(abs(ms(NOW/HOUR,pd)),3.85e-10,1.25,1),0.625)

To get the second value for the recip function, I figured out how many 
milliseconds were in 30 days, then inverted that.  If I understand 
everything correctly, this will result in boost values from 1.25 for 
docs created right now to 0.625 for docs created = 30 days ago.  If I 
need to adjust 1.25 to X, then I also need to adjust the 0.625 value to 
0.5 times X.


Does that look OK?

Thanks,
Shawn

Re: RSS tutorial that comes with the apache-solr not indexing

2013-01-14 Thread Steve Rowe

Yes, thanks Alex, I've fixed 
solr/example/example-DIH/solr/rss/conf/rss-data-config.xml

On Jan 14, 2013, at 2:08 PM, Alexandre Rafalovitch arafa...@gmail.com wrote:

 Is that something that needs to be updated in the example schema as well
 then?
 
 Regards,
   Alex.
 
 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
 
 
 On Mon, Jan 14, 2013 at 1:51 PM, bibhor bib...@gmail.com wrote:
 
 Hi Steve
 Thank you for your help. After I updated the rss-data-config to following,
 it worked.
 
 dataConfig
dataSource type=URLDataSource /
document
entity name=rsstest
pk=link
url=http://www.feedforall.com/sample.xml;
processor=XPathEntityProcessor
forEach=/rss/channel/item
transformer=DateFormatTransformer
 
field column=title xpath=/rss/channel/item/title /
field column=link xpath=/rss/channel/item/link /
/entity
/document
 /dataConfig
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/RSS-tutorial-that-comes-with-the-apache-solr-not-indexing-tp4033067p4033254.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Is there any way to check what index-time document boost value is?

2013-01-14 Thread Alexandre Rafalovitch

Hello,

I have indexed a document with an assigned document-level boost factor. Is
there any way to double-check that the boost factor is actually
recorded/used?

I tried debug.explain.structured but it does not seem to have it.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)

Re: incorrect solr update behavior

2013-01-14 Thread Gary Yngve

Of course, as soon as I post this, I discover this:

https://issues.apache.org/jira/browse/SOLR-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13537900#comment-13538174

i'll give this patch a spin in the morning.

(this is not an example of how to use antecedents :))

-g


On Mon, Jan 14, 2013 at 6:27 PM, Gary Yngve gary.yn...@gmail.com wrote:

 Posting this

 ?xml version=1.0 encoding=UTF-8?adddocfield name=nickname_s
 update=setblah/fieldfield name=tags_ss
 update=addqux/fieldfield name=tags_ss
 update=addquux/fieldfield name=idfoo/field/doc/add

 to an existing doc with foo and bar tags
 results in tags_ss containing
 arr name=tags_ss
 str{add=qux}/str
 str{add=quux}/str
 /arr

 whereas posting this

 ?xml version=1.0 encoding=UTF-8?adddocfield name=nickname_s
 update=setblah/fieldfield name=tags_ss
 update=addqux/fieldfield name=idfoo/field/doc/add

 results in the expected behavior:
 arr name=tags_ss
 strfoo/str
 strbar/str
 strqux/str
 /arr

 Any ideas?

 Thanks,
 Gary

RE: SOlr 3.5 and sharding

2013-01-14 Thread Jean-Sebastien Vachon

Ok that was my first thought... thanks for the confirmation

-Original Message-
From: Michael Ryan [mailto:mr...@moreover.com] 
Sent: January-14-13 10:06 AM
To: solr-user@lucene.apache.org
Subject: RE: SOlr 3.5 and sharding 

If you have the same documents -- with the same uniqueKey -- across multiple 
shards, the count will not be what you expect. You'll need to ensure that each 
document exists only on a single shard.

-Michael

-Original Message-
From: Jean-Sebastien Vachon [mailto:jean-sebastien.vac...@wantedanalytics.com]
Sent: Monday, January 14, 2013 9:59 AM
To: solr-user@lucene.apache.org
Subject: SOlr 3.5 and sharding 

Hi,

I`m setting up a small Sorl setup consisting of 1 master node and 4 shards. For 
now, all four shards contains the exact same data. When I perform a query on 
each individual shards for the word `java` I am receiving the same number of 
docs (as expected). However, when I am going through the master node using the 
shards parameters, the number of results is slightly off by a few documents. 
There is nothing special in my setup so I`m looking for hints on why I am 
getting this problem

Thanks

-
Aucun virus trouvé dans ce message.
Analyse effectuée par AVG - www.avg.fr
Version: 2013.0.2805 / Base de données virale: 2637/5996 - Date: 29/12/2012 La 
Base de données des virus a expiré.

Re: I/O exception (java.net.SocketException) caught when processing request: Connection reset

2013-01-14 Thread Otis Gospodnetic

Hi,

I suspect you might find some information about the cause on the server
side, in your container's logs. I'd look there for the real source of the
problem before trying to just reconnect from the client side.

Otis
--
Solr ElasticSearch Support
http://sematext.com/

On Mon, Jan 14, 2013 at 5:06 PM, Joe joe.pol...@gmail.com wrote:

I have a multi-threaded application in solrj 4. The threads (max 25) share
one connection to HttpSolrServer. Each thread is running one query. This
worked fine for a while, until it finally crashed with the following
messages:

Jan 12, 2013 12:52:15 PM org.apache.http.impl.client.DefaultRequestDirector
tryExecute
INFO: Retrying request
Jan 12, 2013 12:52:15 PM org.apache.http.impl.client.DefaultRequestDirector
tryExecute
INFO: I/O exception (java.net.SocketException) caught when processing
request: Connection reset

I would like to catch this exception and reset the connection to the
server.
I don't get the whole stack trace with the above message, so I'm not sure
where to do this. The only place I reference the server is for making a
query with:

server.query( query )

But the only exception this throws is SolrServerException, which I'm
currently handling.

Any suggestions would be greatly appreciated.

FYI, this is how I'm setting up the initial server connection:

server = new HttpSolrServer(url);
server.setSoTimeout(0);
server.setDefaultMaxConnectionsPerHost(50);
server.setMaxTotalConnections(128);

--
View this message in context:
http://lucene.472066.n3.nabble.com/I-O-exception-java-net-SocketException-caught-when-processing-request-Connection-reset-tp4033309.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: performing a boolean query (OR) with a large number of terms

2013-01-14 Thread Otis Gospodnetic

Hi,

Also have a look at mm=0% acting as OR if you end up using dismax:
http://wiki.apache.org/solr/ExtendedDisMax#mm_.28Minimum_.27Should.27_Match.29

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Thu, Jan 10, 2013 at 7:47 AM, Erick Erickson erickerick...@gmail.comwrote:

 No, you're pretty much on track. You can also just include the field
 multiple times if you want, itemModelNoExactMatchStr:123-4567 OR
 itemMOdelNoExactMatchStr:345-034985

 But
 itemModelNoExactMatchStr:(123-4567 OR 345-034985) works just as well and is
 more compact.

 15 terms is actually quite short by Solr standards, the default cap is 1024
 boolean clauses, which you can change, it's just there to catch
 pathalogical (usually machine-generated) huge queries and make sure you
 consciously make a decision that it's OK.

 Best
 Erick


 On Wed, Jan 9, 2013 at 4:58 PM, geeky2 gee...@hotmail.com wrote:

  hello,
 
  environment: solr 3.5
 
  i have a requirement to perform a boolean query (like the example below)
  with a large number of terms.
 
  the number of terms could be 15 or possibly larger.
 
  after looking over several theads and the smiley book - i think i just
 have
  include the parens and string all of the terms together with OR's
 
  i just want to make sure that i am not missing anything.
 
  is there a better or more efficient way of doing this?
 
  http://server:port
 
 /dir/core1/select?qt=modelItemNoSearchq=itemModelNoExactMatchStr:%285-100-NGRT7%20OR%205-10-10MS7%20OR%20404%29rows=30debugQuery=onrows=40
 
 
  thx
  mark
 
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/performing-a-boolean-query-OR-with-a-large-number-of-terms-tp4032039.html
  Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is there any way to check what index-time document boost value is?

2013-01-14 Thread Jack Krupansky

The norm function query gives you the combination of the index-time boost 
and length-normalization. And it's a low-resolution approximation at that. 
That's all that is stored, so that's as good as you can get.


See:
http://wiki.apache.org/solr/FunctionQuery#norm

I believe that norm should show up in the debug explain.

And for a description of the calculation of that norm factor, see:
http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

-- Jack Krupansky

-Original Message- 
From: Alexandre Rafalovitch

Sent: Monday, January 14, 2013 9:13 PM
To: solr-user@lucene.apache.org
Subject: Is there any way to check what index-time document boost value is?

Hello,

I have indexed a document with an assigned document-level boost factor. Is
there any way to double-check that the boost factor is actually
recorded/used?

I tried debug.explain.structured but it does not seem to have it.

Regards,
  Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)

Re: core.SolrCore - java.io.FileNotFoundException

2013-01-14 Thread Jun Wang

The problem is occured recently again, this time exception is

2013-01-14 10:17:23,865 ERROR core.SolrCore -
java.io.FileNotFoundException:
/home/admin/index/core_p_shard4/index/_1ozb.fnm (No such file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.init(RandomAccessFile.java:216)
at
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:222)
at
org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:232)
at
org.apache.lucene.codecs.lucene40.Lucene40FieldInfosReader.read(Lucene40FieldInfosReader.java:52)
at
org.apache.lucene.index.SegmentCoreReaders.init(SegmentCoreReaders.java:101)
at
org.apache.lucene.index.SegmentReader.init(SegmentReader.java:57)
at
org.apache.lucene.index.ReadersAndLiveDocs.getReader(ReadersAndLiveDocs.java:120)
at
org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:214)
at
org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:3010)
at
org.apache.lucene.index.DocumentsWriter.applyAllDeletes(DocumentsWriter.java:180)
at
org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:310)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:386)
at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1445)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:210)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:448)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:325)
at
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:230)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:157)


But with INFOSTREAM open, I got more logs. here is some part of log which I
think is valuable for this problem ordered by time.

1475075 IW 1 [Sun Jan 13 13:04:04 PST 2013; http-0.0.0.0-8080-11]: commit:
pendingCommit != null
1475076 IW 1 [Sun Jan 13 13:04:04 PST 2013; http-0.0.0.0-8080-11]: commit:
wrote segments file segments_1o
1475077 IFD 1 [Sun Jan 13 13:04:04 PST 2013; http-0.0.0.0-8080-11]: now
checkpoint _320(4.0.0.2):C10 _32e(4.0.0.2):C89194 _32n(4.0.0.2):C9
   8196 _333(4.0.0.2):C96199 _32m(4.0.0.2):C1 _32c(4.0.0.2):C1
_32q(4.0.0.2):C1 _32t(4.0.0.2):C1 _326(4.0.0.2):C53 _32k(4.
 0.0.2):C8448/1 _332(4.0.0.2):C1 _335(4.0.0.2):C1
_336(4.0.0.2):C5799 _32r(4.0.0.2):C2477 _32v(4.0.0.2):C9042
_32x(4.0.0.2):C43 _334(4.0.0.2):C2328 _32z(4.0.0.2):C4035
_32o(4.0.0.2):C7201 _32s(4.0.0.2):C8020 [20 segments ; isCommit = true]
1475078 IFD 1 [Sun Jan 13 13:04:04 PST 2013; http-0.0.0.0-8080-11]:
deleteCommits: now decRef commit segments_1n
.
1490745 DWPT 0 [Mon Jan 14 09:21:37 PST 2013; http-0.0.0.0-8080-17]: flush
postings as segment _1ozb numDocs=1
.
1491090 IFD 0 [Mon Jan 14 09:21:37 PST 2013; http-0.0.0.0-8080-4]: refresh
[prefix=null]: removing newly created unreferenced file _1ozb.fdx
1491091 IFD 0 [Mon Jan 14 09:21:37 PST 2013; http-0.0.0.0-8080-4]: delete
_1ozb.fdx
.
1491152 IFD 0 [Mon Jan 14 09:21:37 PST 2013; http-0.0.0.0-8080-4]: refresh
[prefix=null]: removing newly created unreferenced file _1ozb.fnm
1491153 IFD 0 [Mon Jan 14 09:21:37 PST 2013; http-0.0.0.0-8080-4]: delete
_1ozb.fnm
.
1491301 DW 0 [Mon Jan 14 09:21:37 PST 2013; http-0.0.0.0-8080-16]:
publishFlushedSegment seg-private deletes=null
1491302 IW 0 [Mon Jan 14 09:21:37 PST 2013; http-0.0.0.0-8080-16]:
publishFlushedSegment
1491303 BD 0 [Mon Jan 14 09:21:37 PST 2013; http-0.0.0.0-8080-16]: push
deletes  10247 deleted terms (unique count=10247) bytesUsed=49152 delGe
   n=15715 packetCount=4599 totBytesUsed=117810176
1491304 IW 0 [Mon Jan 14 09:21:37 PST 2013; http-0.0.0.0-8080-16]: publish
sets newSegment delGen=15716 seg=_1ozb(4.0.0.2):C1
1491305 IFD 0 [Mon Jan 14 09:21:37 PST 2013; http-0.0.0.0-8080-16]: now
checkpoint _1oyu(4.0.0.2):C8992 _1oyv(4.0.0.2):C2030 _1oyz(4.0.0.2):C8
   294 _1oyx(4.0.0.2):C2031 _1oyw(4.0.0.2):C259 _1oz3(4.0.0.2):C8375
_1oz1(4.0.0.2):C2836 _1oz5(4.0.0.2):C8231 _1oyy(4.0.0.2):C29 _1oz4(4.
 0.0.2):C2988 _1oz8(4.0.0.2):C1 _1ozb(4.0.0.2):C1 [12 segments ;
isCommit = false]
1491306 DW 0 [Mon Jan 14 09:21:37 PST 2013; http-0.0.0.0-8080-16]: force
apply deletes bytesUsed=118015987 vs ramBuffer=1.34217728E8
149
1491307 BD 0 [Mon Jan 14 09:21:37 PST 2013; http-0.0.0.0-8080-2]:
applyDeletes: infos=[_1oyu(4.0.0.2):C8992, _1oyv(4.0.0.2):C2030,
_1oyz(4.0.0.2):C8294, _1oyx(4.0.0.2):C2031, _1oyw(4.0.0.2):C259,
_1oz3(4.0.0.2):C8375,

Regarding Copyfield

2013-01-14 Thread anurag.jain

hi

in copy field i am not storing first_name last_name etc. but in dest =
text it is showing first_name .. etc. in auto suggestion mode.

my copy field are ..
   copyField source=percentage dest=text/
   copyField source=university_name dest=text/
   copyField source=course_name dest=text/
   ...

and field are ..
field name=id type=text_general indexed=true stored=true
required=true multiValued=false / 
   field name=first_name type=text_general indexed=false
stored=true/
   field name=last_name type=text_general indexed=false
stored=true/
   field name=date_of_birth type=text_general indexed=false
stored=true/
   field name=state_name type=text_general indexed=false
stored=true/
   field name=mobile_no type=text_general indexed=false
stored=true/
   ...


and also i want to make own field like text  named as autosuggest then it
is also not working for autosuggestion. 



please reply urgent





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regarding-Copyfield-tp4033385.html
Sent from the Solr - User mailing list archive at Nabble.com.

problem in Velocity spell output

2013-01-14 Thread anurag.jain

if i am giving input antrag then it is showing following line.

Did you mean
{collationQuery=anurag,hits=1,misspellingsAndCorrections={aturag=anurag}}? 


i want output in format of 

Did you mean anurag ? 


how can i solve it please give me solution..

Thanks in advance

Reply please urgent



--
View this message in context: 
http://lucene.472066.n3.nabble.com/problem-in-Velocity-spell-output-tp4033390.html
Sent from the Solr - User mailing list archive at Nabble.com.

51 matches

Mail list logo