[jira] [Commented] (NUTCH-1554) org.apache.nutch.net.protocols.HttpDateFormat should NOT be Locale.US aware

2013-04-08 Thread Sebastian Nagel (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13626292#comment-13626292
 ] 

Sebastian Nagel commented on NUTCH-1554:


Thanks, [~lewismc]. The RFC specifies English names for weekdays and months. 
Non-English locales may use the names of the locale-specific language, e.g., 
Russian "Пн" (="понедельник") instead of "Monday".

> org.apache.nutch.net.protocols.HttpDateFormat should NOT be Locale.US aware
> ---
>
> Key: NUTCH-1554
> URL: https://issues.apache.org/jira/browse/NUTCH-1554
> Project: Nutch
>  Issue Type: Bug
>Affects Versions: 1.6, 2.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Minor
> Fix For: 1.7, 2.2
>
> Attachments: NUTCH-1554-2.x.patch, NUTCH-1554-trunk.patch
>
>
> I assume this is legacy code.
> Currently the above class is Locale specific and really should not be. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: FetchSchedule and Metadata

2013-04-08 Thread Canan GİRGİN
Hi Lewis,

My custom CustomDefaultFetchSchedule class "getFields" method is never
called. Have you got any idea?

In extension points, filters getField method is calling before start
operation.
For Example: ParseFilters getField methods are called by  ParserJob.

CustomDefaultFetchSchedule.getfields():

  @Override
public Set getFields() {

FIELDS.addAll(super.getFields());
FIELDS.add(WebPage.Field.METADATA);
return FIELDS;
}

When I add metadata field in GeneratorJob class , eveything is okey and
metadata field is not empty:
  static {
FIELDS.add(WebPage.Field.FETCH_TIME);
FIELDS.add(WebPage.Field.SCORE);
FIELDS.add(WebPage.Field.STATUS);
FIELDS.add(WebPage.Field.METADATA);
  }


Nutch 2.1 / HBASE


On Mon, Apr 8, 2013 at 9:26 AM, Canan GİRGİN  wrote:

> Hi Lewis,
>
> Yes, I *added *language-identifier. In DB metada column I can see
> "language=en"
>
>
>
> On Mon, Apr 8, 2013 at 12:06 AM, Lewis John Mcgibbney <
> lewis.mcgibb...@gmail.com> wrote:
>
>> Hi Canan,
>>
>>
>> On Sun, Apr 7, 2013 at 1:41 AM,  wrote:
>>
>>> Than I try to use Metada Field. But this field is always null:
>>> *ByteBuffer blang = page.getFromMetadata(new Utf8(Metadata.LANGUAGE));
>>>
>>>
>>> Did you add the language-identifier plugin to Nutch plugin.includes
>> property in nutch-site.xml?
>>
>
>


Build failed in Jenkins: Nutch-trunk #2162

2013-04-08 Thread Apache Jenkins Server
See 

--
[...truncated 4039 lines...]
[javac]   ^
[javac] 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/src/test/org/apache/nutch/crawl/TestCrawlDbMerger.java:108:
 warning: [rawtypes] found raw type: Iterator
[javac] Iterator it = expected.keySet().iterator();
[javac] ^
[javac]   missing type arguments for generic class Iterator
[javac]   where E is a type-variable:
[javac] E extends Object declared in interface Iterator
[javac] 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/src/test/org/apache/nutch/crawl/TestCrawlDbMerger.java:123:
 warning: [deprecation] delete(Path) in FileSystem has been deprecated
[javac] fs.delete(testDir);
[javac]   ^
[javac] 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/src/test/org/apache/nutch/crawl/TestCrawlDbMerger.java:126:
 warning: [rawtypes] found raw type: TreeSet
[javac]   private void createCrawlDb(Configuration config, FileSystem fs, 
Path crawldb, TreeSet init, CrawlDatum cd) throws Exception {
[javac] 
^
[javac]   missing type arguments for generic class TreeSet
[javac]   where E is a type-variable:
[javac] E extends Object declared in class TreeSet
[javac] 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/src/test/org/apache/nutch/crawl/TestCrawlDbMerger.java:130:
 warning: [rawtypes] found raw type: Iterator
[javac] Iterator it = init.iterator();
[javac] ^
[javac]   missing type arguments for generic class Iterator
[javac]   where E is a type-variable:
[javac] E extends Object declared in interface Iterator
[javac] 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/src/test/org/apache/nutch/crawl/TestLinkDbMerger.java:71:
 warning: [rawtypes] found raw type: TreeMap
[javac]   TreeMap init1 = new TreeMap();
[javac]   ^
[javac]   missing type arguments for generic class TreeMap
[javac]   where K,V are type-variables:
[javac] K extends Object declared in class TreeMap
[javac] V extends Object declared in class TreeMap
[javac] 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/src/test/org/apache/nutch/crawl/TestLinkDbMerger.java:71:
 warning: [rawtypes] found raw type: TreeMap
[javac]   TreeMap init1 = new TreeMap();
[javac]   ^
[javac]   missing type arguments for generic class TreeMap
[javac]   where K,V are type-variables:
[javac] K extends Object declared in class TreeMap
[javac] V extends Object declared in class TreeMap
[javac] 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/src/test/org/apache/nutch/crawl/TestLinkDbMerger.java:72:
 warning: [rawtypes] found raw type: TreeMap
[javac]   TreeMap init2 = new TreeMap();
[javac]   ^
[javac]   missing type arguments for generic class TreeMap
[javac]   where K,V are type-variables:
[javac] K extends Object declared in class TreeMap
[javac] V extends Object declared in class TreeMap
[javac] 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/src/test/org/apache/nutch/crawl/TestLinkDbMerger.java:72:
 warning: [rawtypes] found raw type: TreeMap
[javac]   TreeMap init2 = new TreeMap();
[javac]   ^
[javac]   missing type arguments for generic class TreeMap
[javac]   where K,V are type-variables:
[javac] K extends Object declared in class TreeMap
[javac] V extends Object declared in class TreeMap
[javac] 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/src/test/org/apache/nutch/crawl/TestLinkDbMerger.java:73:
 warning: [rawtypes] found raw type: HashMap
[javac]   HashMap expected = new HashMap();
[javac]   ^
[javac]   missing type arguments for generic class HashMap
[javac]   where K,V are type-variables:
[javac] K extends Object declared in class HashMap
[javac] V extends Object declared in class HashMap
[javac] 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/src/test/org/apache/nutch/crawl/TestLinkDbMerger.java:73:
 warning: [rawtypes] found raw type: HashMap
[javac]   HashMap expected = new HashMap();
[javac]  ^
[javac]   missing type arguments for generic class HashMap
[javac]   where K,V are type-variables:
[javac] K extends Object declared in class HashMap
[javac] V extends Object declared in class HashMap
[javac] 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/src/test/org/apache/nutc

[jira] [Commented] (NUTCH-1554) org.apache.nutch.net.protocols.HttpDateFormat should NOT be Locale.US aware

2013-04-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13626164#comment-13626164
 ] 

Hudson commented on NUTCH-1554:
---

Integrated in Nutch-nutchgora #562 (See 
[https://builds.apache.org/job/Nutch-nutchgora/562/])
revert NUTCH-1554 org.apache.nutch.net.protocols.HttpDateFormat should NOT 
be Locale.US aware (Revision 1465834)

 Result = SUCCESS
lewismc : http://svn.apache.org/viewvc/nutch/branches/2.x/?view=rev&rev=1465834
Files : 
* /nutch/branches/2.x/CHANGES.txt
* 
/nutch/branches/2.x/src/java/org/apache/nutch/net/protocols/HttpDateFormat.java


> org.apache.nutch.net.protocols.HttpDateFormat should NOT be Locale.US aware
> ---
>
> Key: NUTCH-1554
> URL: https://issues.apache.org/jira/browse/NUTCH-1554
> Project: Nutch
>  Issue Type: Bug
>Affects Versions: 1.6, 2.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Minor
> Fix For: 1.7, 2.2
>
> Attachments: NUTCH-1554-2.x.patch, NUTCH-1554-trunk.patch
>
>
> I assume this is legacy code.
> Currently the above class is Locale specific and really should not be. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1554) org.apache.nutch.net.protocols.HttpDateFormat should NOT be Locale.US aware

2013-04-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13626156#comment-13626156
 ] 

Hudson commented on NUTCH-1554:
---

Integrated in Nutch-trunk #2161 (See 
[https://builds.apache.org/job/Nutch-trunk/2161/])
revert NUTCH-1554 org.apache.nutch.net.protocols.HttpDateFormat should NOT 
be Locale.US aware (Revision 1465831)

 Result = SUCCESS
lewismc : http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1465831
Files : 
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/src/java/org/apache/nutch/net/protocols/HttpDateFormat.java


> org.apache.nutch.net.protocols.HttpDateFormat should NOT be Locale.US aware
> ---
>
> Key: NUTCH-1554
> URL: https://issues.apache.org/jira/browse/NUTCH-1554
> Project: Nutch
>  Issue Type: Bug
>Affects Versions: 1.6, 2.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Minor
> Fix For: 1.7, 2.2
>
> Attachments: NUTCH-1554-2.x.patch, NUTCH-1554-trunk.patch
>
>
> I assume this is legacy code.
> Currently the above class is Locale specific and really should not be. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (NUTCH-1554) org.apache.nutch.net.protocols.HttpDateFormat should NOT be Locale.US aware

2013-04-08 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved NUTCH-1554.
-

Resolution: Not A Problem

> org.apache.nutch.net.protocols.HttpDateFormat should NOT be Locale.US aware
> ---
>
> Key: NUTCH-1554
> URL: https://issues.apache.org/jira/browse/NUTCH-1554
> Project: Nutch
>  Issue Type: Bug
>Affects Versions: 1.6, 2.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Minor
> Fix For: 1.7, 2.2
>
> Attachments: NUTCH-1554-2.x.patch, NUTCH-1554-trunk.patch
>
>
> I assume this is legacy code.
> Currently the above class is Locale specific and really should not be. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1554) org.apache.nutch.net.protocols.HttpDateFormat should NOT be Locale.US aware

2013-04-08 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13626003#comment-13626003
 ] 

Lewis John McGibbney commented on NUTCH-1554:
-

Commit reverted @revision 1465831 in trunk
Commit reverted @revision 1465834 in 2.x

[~wastl-nagel] thank you very much for keeping an eye on the ball here. I must 
admit though, I don't see any mention that English must be used. All I see here 
is that the GMT time mechanism must be adhered to. I have reverted and will 
close this off.
Thanks again.  

> org.apache.nutch.net.protocols.HttpDateFormat should NOT be Locale.US aware
> ---
>
> Key: NUTCH-1554
> URL: https://issues.apache.org/jira/browse/NUTCH-1554
> Project: Nutch
>  Issue Type: Bug
>Affects Versions: 1.6, 2.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Minor
> Fix For: 1.7, 2.2
>
> Attachments: NUTCH-1554-2.x.patch, NUTCH-1554-trunk.patch
>
>
> I assume this is legacy code.
> Currently the above class is Locale specific and really should not be. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (NUTCH-1554) org.apache.nutch.net.protocols.HttpDateFormat should NOT be Locale.US aware

2013-04-08 Thread Sebastian Nagel (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel reopened NUTCH-1554:



Hi Lewis, the opposite is true: now the HttpDateFormat is sensitive to the 
locale set on the system. If a Russian locale is used the if-modified-since 
date sent in the HTTP header will look like:
{code}
% LC_ALL=ru_RU.utf8 runtime/local/bin/nutch \
org.apache.nutch.net.protocols.HttpDateFormat
Пн, 08 апр 2013 21:24:24 GMT
{code}
That's definitely not the date format specified in the HTTP RFC 
(http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.3.1).

See also: 
http://blog.thetaphi.de/2012/07/default-locales-default-charsets-and.html


> org.apache.nutch.net.protocols.HttpDateFormat should NOT be Locale.US aware
> ---
>
> Key: NUTCH-1554
> URL: https://issues.apache.org/jira/browse/NUTCH-1554
> Project: Nutch
>  Issue Type: Bug
>Affects Versions: 1.6, 2.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Minor
> Fix For: 1.7, 2.2
>
> Attachments: NUTCH-1554-2.x.patch, NUTCH-1554-trunk.patch
>
>
> I assume this is legacy code.
> Currently the above class is Locale specific and really should not be. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1554) org.apache.nutch.net.protocols.HttpDateFormat should NOT be Locale.US aware

2013-04-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13625761#comment-13625761
 ] 

Hudson commented on NUTCH-1554:
---

Integrated in Nutch-nutchgora #561 (See 
[https://builds.apache.org/job/Nutch-nutchgora/561/])
NUTCH-1554 org.apache.nutch.net.protocols.HttpDateFormat should NOT be 
Locale.US aware (Revision 1465742)

 Result = SUCCESS
lewismc : http://svn.apache.org/viewvc/nutch/branches/2.x/?view=rev&rev=1465742
Files : 
* /nutch/branches/2.x/CHANGES.txt
* 
/nutch/branches/2.x/src/java/org/apache/nutch/net/protocols/HttpDateFormat.java


> org.apache.nutch.net.protocols.HttpDateFormat should NOT be Locale.US aware
> ---
>
> Key: NUTCH-1554
> URL: https://issues.apache.org/jira/browse/NUTCH-1554
> Project: Nutch
>  Issue Type: Bug
>Affects Versions: 1.6, 2.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Minor
> Fix For: 1.7, 2.2
>
> Attachments: NUTCH-1554-2.x.patch, NUTCH-1554-trunk.patch
>
>
> I assume this is legacy code.
> Currently the above class is Locale specific and really should not be. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1554) org.apache.nutch.net.protocols.HttpDateFormat should NOT be Locale.US aware

2013-04-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13625760#comment-13625760
 ] 

Hudson commented on NUTCH-1554:
---

Integrated in Nutch-trunk #2160 (See 
[https://builds.apache.org/job/Nutch-trunk/2160/])
NUTCH-1554 org.apache.nutch.net.protocols.HttpDateFormat should NOT be 
Locale.US aware (Revision 1465741)

 Result = SUCCESS
lewismc : http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1465741
Files : 
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/src/java/org/apache/nutch/net/protocols/HttpDateFormat.java


> org.apache.nutch.net.protocols.HttpDateFormat should NOT be Locale.US aware
> ---
>
> Key: NUTCH-1554
> URL: https://issues.apache.org/jira/browse/NUTCH-1554
> Project: Nutch
>  Issue Type: Bug
>Affects Versions: 1.6, 2.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Minor
> Fix For: 1.7, 2.2
>
> Attachments: NUTCH-1554-2.x.patch, NUTCH-1554-trunk.patch
>
>
> I assume this is legacy code.
> Currently the above class is Locale specific and really should not be. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (NUTCH-1554) org.apache.nutch.net.protocols.HttpDateFormat should NOT be Locale.US aware

2013-04-08 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved NUTCH-1554.
-

Resolution: Fixed

Committed @revision 1465741 in trunk
Committed @revision 1465742 in 2.x

> org.apache.nutch.net.protocols.HttpDateFormat should NOT be Locale.US aware
> ---
>
> Key: NUTCH-1554
> URL: https://issues.apache.org/jira/browse/NUTCH-1554
> Project: Nutch
>  Issue Type: Bug
>Affects Versions: 1.6, 2.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Minor
> Fix For: 1.7, 2.2
>
> Attachments: NUTCH-1554-2.x.patch, NUTCH-1554-trunk.patch
>
>
> I assume this is legacy code.
> Currently the above class is Locale specific and really should not be. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (NUTCH-1554) org.apache.nutch.net.protocols.HttpDateFormat should NOT be Locale.US aware

2013-04-08 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney reassigned NUTCH-1554:
---

Assignee: Lewis John McGibbney

> org.apache.nutch.net.protocols.HttpDateFormat should NOT be Locale.US aware
> ---
>
> Key: NUTCH-1554
> URL: https://issues.apache.org/jira/browse/NUTCH-1554
> Project: Nutch
>  Issue Type: Bug
>Affects Versions: 1.6, 2.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Minor
> Fix For: 1.7, 2.2
>
> Attachments: NUTCH-1554-2.x.patch, NUTCH-1554-trunk.patch
>
>
> I assume this is legacy code.
> Currently the above class is Locale specific and really should not be. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (NUTCH-1554) org.apache.nutch.net.protocols.HttpDateFormat should NOT be Locale.US aware

2013-04-08 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-1554:


Patch Info: Patch Available

> org.apache.nutch.net.protocols.HttpDateFormat should NOT be Locale.US aware
> ---
>
> Key: NUTCH-1554
> URL: https://issues.apache.org/jira/browse/NUTCH-1554
> Project: Nutch
>  Issue Type: Bug
>Affects Versions: 1.6, 2.1
>Reporter: Lewis John McGibbney
>Priority: Minor
> Fix For: 1.7, 2.2
>
> Attachments: NUTCH-1554-2.x.patch, NUTCH-1554-trunk.patch
>
>
> I assume this is legacy code.
> Currently the above class is Locale specific and really should not be. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (NUTCH-1554) org.apache.nutch.net.protocols.HttpDateFormat should NOT be Locale.US aware

2013-04-08 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-1554:


Attachment: NUTCH-1554-trunk.patch
NUTCH-1554-2.x.patch

trivial patches for trunk and 2.x

> org.apache.nutch.net.protocols.HttpDateFormat should NOT be Locale.US aware
> ---
>
> Key: NUTCH-1554
> URL: https://issues.apache.org/jira/browse/NUTCH-1554
> Project: Nutch
>  Issue Type: Bug
>Affects Versions: 1.6, 2.1
>Reporter: Lewis John McGibbney
>Priority: Minor
> Fix For: 1.7, 2.2
>
> Attachments: NUTCH-1554-2.x.patch, NUTCH-1554-trunk.patch
>
>
> I assume this is legacy code.
> Currently the above class is Locale specific and really should not be. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1555) bug in 2.x ParserJob command line parsing

2013-04-08 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13625630#comment-13625630
 ] 

Lewis John McGibbney commented on NUTCH-1555:
-

I was thinking about using commons cli for better parsing of command line 
arguments. Over some time we have had problems with the CLI parsing and using 
an established framework like commons cli or jcommander would really clear that 
up.
Regarding the String type arguments, I suppose we would need to enforce more 
strict type and data checking if we were not to use some framework (here I 
assume a CLI framework would provide us with this functionality from the 
beginning)  
What do you think Feng?

> bug in 2.x ParserJob command line parsing 
> --
>
> Key: NUTCH-1555
> URL: https://issues.apache.org/jira/browse/NUTCH-1555
> Project: Nutch
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.1
>Reporter: Lewis John McGibbney
> Fix For: 2.2
>
>
> I just accidentally passed in the following argument to parser job
> {code}
> law@CEE279Law3-Linux:~/Downloads/asf/2.x/runtime/local$ ./bin/nutch parse 
> updatedb
> ParserJob: starting
> ParserJob: resuming:  false
> ParserJob: forced reparse:false
> ParserJob: batchId:   updatedb
> ParserJob: success
> {code}
> This is a bug for sure

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1555) bug in 2.x ParserJob command line parsing

2013-04-08 Thread lufeng (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13625432#comment-13625432
 ] 

lufeng commented on NUTCH-1555:
---

Hi Lewis, as you said that FetchJob also has this bug too. command running 
result like this

{code:java} 
lemo@debian:~/Workspace/java/apache-workspace/nutch2.x-svn/runtime/local$ 
bin/nutch fetch updatedb
FetcherJob: starting
FetcherJob: batchId: updatedb
Fetcher: Your 'http.agent.name' value should be listed first in 
'http.robots.agents' property.
FetcherJob: threads: 10
FetcherJob: parsing: false
FetcherJob: resuming: false
FetcherJob : timelimit set for : -1
Using queue mode : byHost
Fetcher: threads: 10
{code}

because the type of batchId is a string. 

> bug in 2.x ParserJob command line parsing 
> --
>
> Key: NUTCH-1555
> URL: https://issues.apache.org/jira/browse/NUTCH-1555
> Project: Nutch
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.1
>Reporter: Lewis John McGibbney
> Fix For: 2.2
>
>
> I just accidentally passed in the following argument to parser job
> {code}
> law@CEE279Law3-Linux:~/Downloads/asf/2.x/runtime/local$ ./bin/nutch parse 
> updatedb
> ParserJob: starting
> ParserJob: resuming:  false
> ParserJob: forced reparse:false
> ParserJob: batchId:   updatedb
> ParserJob: success
> {code}
> This is a bug for sure

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira