[
http://issues.apache.org/jira/browse/NUTCH-356?page=comments#action_12431548 ]
Enis Soztutar commented on NUTCH-356:
-
I observed strange behaviour, when one of the plug-ins could not be included.
For example the plugin system fails to
a url tokenizer implementation for tokenizing index fields : url and host
--
Key: NUTCH-389
URL: http://issues.apache.org/jira/browse/NUTCH-389
Project: Nutch
Issue
[ http://issues.apache.org/jira/browse/NUTCH-389?page=all ]
Enis Soztutar updated NUTCH-389:
Attachment: urlTokenizer.diff
patch for url tokenization
a url tokenizer implementation for tokenizing index fields : url and host
[ http://issues.apache.org/jira/browse/NUTCH-389?page=all ]
Enis Soztutar updated NUTCH-389:
Description:
NutchAnalysis.jj tokenizes the input by threating and _ as non token
seperators, which is in the case of the urls not appropriate. So i have
[
http://issues.apache.org/jira/browse/NUTCH-389?page=comments#action_12445512 ]
Enis Soztutar commented on NUTCH-389:
-
Otis you can test the tokenizer using the TestUrlTokenizer junit test case. And
you cab test the NutchDocumentTokenizer
[ http://issues.apache.org/jira/browse/NUTCH-389?page=all ]
Enis Soztutar updated NUTCH-389:
Attachment: urlTokenizer-improved.diff
This is an improvement and a minor bug fix over the previous url tokenizer.
This version first replaces characters,
[
http://issues.apache.org/jira/browse/NUTCH-393?page=comments#action_12447787 ]
Enis Soztutar commented on NUTCH-393:
-
Also IndexingException is catched by the Indexer, in which case the whole
document is not added to the writer (the
[ http://issues.apache.org/jira/browse/NUTCH-289?page=all ]
Enis Soztutar updated NUTCH-289:
Attachment: ipInCrawlDatumDraftV5.1.patch
The version 5 patch does not run on the current build. So i have fixed it and
resend the patch(did not changed any
[ http://issues.apache.org/jira/browse/NUTCH-251?page=all ]
Enis Soztutar updated NUTCH-251:
Attachment: Nutch-251-AdminGUI.tar.gz
I have updated the patch written by stephan.
This version works with Nutch-0.9-dev and hadoop-0.7.1 (current version of
Top Level Domains Indexing / Scoring
Key: NUTCH-439
URL: https://issues.apache.org/jira/browse/NUTCH-439
Project: Nutch
Issue Type: New Feature
Components: indexer
Affects Versions: 0.9.0
[
https://issues.apache.org/jira/browse/NUTCH-439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-439:
Attachment: tld_plugin_v1.0.patch
This is a plugin implementation for indexing and scoring top
[
https://issues.apache.org/jira/browse/NUTCH-439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-439:
Attachment: tld_plugin_v1.1.patch
I have forgotten to unset http.agent.name in the v1.0
Domain İndexing / Query Filter
--
Key: NUTCH-445
URL: https://issues.apache.org/jira/browse/NUTCH-445
Project: Nutch
Issue Type: New Feature
Components: indexer, searcher
Affects Versions: 0.9.0
[
https://issues.apache.org/jira/browse/NUTCH-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-445:
Attachment: index_query_domain_v1.0.patch
Patch for index-domain and query-domain plugins.
[
https://issues.apache.org/jira/browse/NUTCH-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-445:
Attachment: TranslatingRawFieldQueryFilter_v1.0.patch
This patch complements
[
https://issues.apache.org/jira/browse/NUTCH-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-445:
Attachment: index_query_domain_v1.2.patch
This patch is an update of the previous three patches.
dedup on tokenized fields is faulty
---
Key: NUTCH-455
URL: https://issues.apache.org/jira/browse/NUTCH-455
Project: Nutch
Issue Type: Bug
Components: searcher
Affects Versions: 0.9.0
[
https://issues.apache.org/jira/browse/NUTCH-455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12479262
]
Enis Soztutar commented on NUTCH-455:
-
(from LUCENE-252)
In nutch we have 3 options : 1st is to disallow
[
https://issues.apache.org/jira/browse/NUTCH-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12485977
]
Enis Soztutar commented on NUTCH-466:
-
This patch will indeed resolve many issues related to storing extra
[
https://issues.apache.org/jira/browse/NUTCH-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12485996
]
Enis Soztutar commented on NUTCH-466:
-
There may be many parts that use the same key/value classes in MapFiles.
Fix synchronization in NutchBean creation
-
Key: NUTCH-471
URL: https://issues.apache.org/jira/browse/NUTCH-471
Project: Nutch
Issue Type: Bug
Components: searcher
Affects Versions:
[
https://issues.apache.org/jira/browse/NUTCH-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-471:
Attachment: NutchBeanCreationSync_v1.patch
this patch synchronizes NutchBean.get((ServletContext
[
https://issues.apache.org/jira/browse/NUTCH-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491313
]
Enis Soztutar commented on NUTCH-471:
-
Nice trick with the unsynchronized check. :)
Wow, indeed i have used a
[
https://issues.apache.org/jira/browse/NUTCH-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491882
]
Enis Soztutar commented on NUTCH-475:
-
we can use a formula like :
delay = alpha * delay + (1 - alpha) * (k *
[
https://issues.apache.org/jira/browse/NUTCH-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-471:
Attachment: NutchBeanCreationSync_v2.patch
From
IndexMerger delete working dir
--
Key: NUTCH-510
URL: https://issues.apache.org/jira/browse/NUTCH-510
Project: Nutch
Issue Type: Improvement
Components: indexer
Affects Versions: 1.0.0
[
https://issues.apache.org/jira/browse/NUTCH-510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-510:
Attachment: index.merger.delete.temp.dirs.patch
Attached patch deletes working dirs on finally
[
https://issues.apache.org/jira/browse/NUTCH-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511043
]
Enis Soztutar edited comment on NUTCH-510 at 7/9/07 5:32 AM:
-
Attached patch deletes
[
https://issues.apache.org/jira/browse/NUTCH-508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511121
]
Enis Soztutar commented on NUTCH-508:
-
Tasktracker invokes another jvm calling TaskTracker$Child but
[
https://issues.apache.org/jira/browse/NUTCH-439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-439:
Attachment: tld_plugin_v2.0.patch
I have made major improvements to the code and configuration
[
https://issues.apache.org/jira/browse/NUTCH-439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-439:
Attachment: domain.suffixes_v2.1.patch
Very nice patch!
Thanks !
IP_PATTERN - it could be
[
https://issues.apache.org/jira/browse/NUTCH-439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-439:
Attachment: (was: domain.suffixes_v2.1.patch)
Top Level Domains Indexing / Scoring
[
https://issues.apache.org/jira/browse/NUTCH-439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-439:
Attachment: tld_plugin_v2.1.patch
Oops, it seems that i've uploaded the wrong file. This is the
Fix OpicScoringFilter to respect scoring filter chaining
Key: NUTCH-518
URL: https://issues.apache.org/jira/browse/NUTCH-518
Project: Nutch
Issue Type: Bug
Components:
[
https://issues.apache.org/jira/browse/NUTCH-517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-517:
Attachment: build.encoding.patch
Patch for UTF-8 is attached
build encoding should be UTF-8
[
https://issues.apache.org/jira/browse/NUTCH-439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-439:
Attachment: tld_plugin_v2.2.patch
This patch includes core domain utilities and the tld plugin, but
[
https://issues.apache.org/jira/browse/NUTCH-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513819
]
Enis Soztutar commented on NUTCH-518:
-
Since there is no ordering among scoring filters, if we do something
[
https://issues.apache.org/jira/browse/NUTCH-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513826
]
Enis Soztutar commented on NUTCH-518:
-
I think removing initial score arguments and merging scores in
[
https://issues.apache.org/jira/browse/NUTCH-439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-439:
Attachment: tld_plugin_v2.3.patch
bq. TLDScoringFilter contains a misspelled field, tldEnties, it
Index url field untokenized
---
Key: NUTCH-541
URL: https://issues.apache.org/jira/browse/NUTCH-541
Project: Nutch
Issue Type: New Feature
Components: indexer, searcher
Affects Versions: 1.0.0
[
https://issues.apache.org/jira/browse/NUTCH-439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521033
]
Enis Soztutar commented on NUTCH-439:
-
Recently Matt Cutts have written about parts of the urls :
[
https://issues.apache.org/jira/browse/NUTCH-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12534869
]
Enis Soztutar commented on NUTCH-442:
-
Using nutch with solr has been a very demanding request, so it will be
[
https://issues.apache.org/jira/browse/NUTCH-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537954
]
Enis Soztutar commented on NUTCH-442:
-
Due to the method signature bug
[
https://issues.apache.org/jira/browse/NUTCH-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541326
]
Enis Soztutar commented on NUTCH-574:
-
Why don't you just refactor indexing anchor code into another plugin, say
[
https://issues.apache.org/jira/browse/NUTCH-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541359
]
Enis Soztutar commented on NUTCH-574:
-
Honestly, i don't think not indexing anchor words that do not appear in
[
https://issues.apache.org/jira/browse/NUTCH-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar reassigned NUTCH-573:
---
Assignee: Enis Soztutar
Multiple Domains - Query Search
---
[
https://issues.apache.org/jira/browse/NUTCH-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-573:
Attachment: multiTermQuery_v1.patch
Here is a patch that enables querying multiple values for the
[
https://issues.apache.org/jira/browse/NUTCH-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542389
]
Enis Soztutar commented on NUTCH-573:
-
bq. Using commas is IMHO not intuitive
With your respect I should
[
https://issues.apache.org/jira/browse/NUTCH-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542449
]
Enis Soztutar commented on NUTCH-573:
-
@Andrzej
I recall google over comma delimited syntax, but now it doesn't
[
https://issues.apache.org/jira/browse/NUTCH-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543067
]
Enis Soztutar commented on NUTCH-573:
-
So, how shall we proceed with this one?
I give +1 to commit this, and
FeedParser empty links for items
Key: NUTCH-583
URL: https://issues.apache.org/jira/browse/NUTCH-583
Project: Nutch
Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Enis Soztutar
[
https://issues.apache.org/jira/browse/NUTCH-586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-586:
Attachment: run-core_v1.patch
Attached file adds -core option to bin/nutch.
Add option to run
Add option to run compiled classes w/o job file
---
Key: NUTCH-586
URL: https://issues.apache.org/jira/browse/NUTCH-586
Project: Nutch
Issue Type: New Feature
Affects Versions: 1.0.0
[
https://issues.apache.org/jira/browse/NUTCH-586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548198
]
Enis Soztutar commented on NUTCH-586:
-
Can someone review this ?
Add option to run compiled classes w/o job
[
https://issues.apache.org/jira/browse/NUTCH-586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-586:
Attachment: run-core_v2.patch
bq. I think you also need to put a comment, which clarifies that this
[
https://issues.apache.org/jira/browse/NUTCH-588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar resolved NUTCH-588.
-
Resolution: Invalid
Jira is not for asking questions. You should ask your questions on nutch-user
[
https://issues.apache.org/jira/browse/NUTCH-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12637489#action_12637489
]
Enis Soztutar commented on NUTCH-442:
-
I personally believe this patch should be in
Evaluate ORM Frameworks which support non-relational column-oriented datastores
and RDBMs
--
Key: NUTCH-808
URL: https://issues.apache.org/jira/browse/NUTCH-808
[
https://issues.apache.org/jira/browse/NUTCH-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852840#action_12852840
]
Enis Soztutar commented on NUTCH-808:
-
A candidate framework is DataNucleus. It has the
[
https://issues.apache.org/jira/browse/NUTCH-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856124#action_12856124
]
Enis Soztutar commented on NUTCH-808:
-
So, this is the results so far :
DataNucleus
[
https://issues.apache.org/jira/browse/NUTCH-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856360#action_12856360
]
Enis Soztutar commented on NUTCH-808:
-
bq. What do you mean by current implementation?
[
https://issues.apache.org/jira/browse/NUTCH-808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar closed NUTCH-808.
---
Resolution: Fixed
We have decided to go on with implementing an ORM layer as per the discussion
on
[
https://issues.apache.org/jira/browse/NUTCH-811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12865226#action_12865226
]
Enis Soztutar commented on NUTCH-811:
-
Hi Piet,
The code for Gora will reside in GitHub
63 matches
Mail list logo