Re: [I] Wrong content while downloading files >200Mb [mina-sshd]

2024-05-05 Thread via GitHub


tomaswolf commented on issue #468:
URL: https://github.com/apache/mina-sshd/issues/468#issuecomment-2094915566

   I've merged a change that tightens SFTP read requests. If a server sends 
back more data than requested, an exception is thrown. If you set 
`SftpModuleProperties.TOLERATE_EXCESS_DATA` to `true`, we only log a warning 
and ignore excess data. Whether that helps in your case in unclear, though. 
When a server returns more data than requested, the client cannot know which 
bytes are in excess: some at the beginning, or some at the end? Or some in 
between?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org
For additional commands, e-mail: dev-h...@mina.apache.org



Re: [I] Wrong content while downloading files >200Mb [mina-sshd]

2024-05-05 Thread via GitHub


tomaswolf closed issue #468: Wrong content while downloading files >200Mb
URL: https://github.com/apache/mina-sshd/issues/468


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org
For additional commands, e-mail: dev-h...@mina.apache.org



Re: [I] Wrong content while downloading files >200Mb [mina-sshd]

2024-02-28 Thread via GitHub


tomaswolf commented on issue #468:
URL: https://github.com/apache/mina-sshd/issues/468#issuecomment-1968793786

   At least their server logs confirm that the server _did_ send too much data 
overall. I wonder if the server logs also show the 9 SSH_FXP_DATA replies of 
41084 bytes every 640th request.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org
For additional commands, e-mail: dev-h...@mina.apache.org



Re: [I] Wrong content while downloading files >200Mb [mina-sshd]

2024-02-28 Thread via GitHub


bferrenq commented on issue #468:
URL: https://github.com/apache/mina-sshd/issues/468#issuecomment-1968742339

   Hello @tomaswolf ,
   I still do not have the server information, and will let you know.
   I have receveid below feedback:
   "There was 2 transfers on 2024-02-26: 
   At  09:29:44.039 UTC, the READ_FILE operation transferred 32768 bytes.
   The corresponding READ_DONE operation was logged at 09:31:54.652 UTC, with 
204425799 bytes transferred.
   
   Later that day, at 15:28:50.850 UTC, the READ_FILE operation transferred 
32755 (difference of 13 bytes).
   The corresponding READ_DONE operation was logged at 15:31:06.338 UTC, with 
204500679 bytes transferred.
   "
   I will let you know once I have more information.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org
For additional commands, e-mail: dev-h...@mina.apache.org



Re: [I] Wrong content while downloading files >200Mb [mina-sshd]

2024-02-26 Thread via GitHub


tomaswolf commented on issue #468:
URL: https://github.com/apache/mina-sshd/issues/468#issuecomment-1964011568

   I've been wondering about -13 myself. The -13 aligns the SFTP buffer size 
such that the whole SSH_FXP_DATA SFTP message fits a channel packet exactly. 
(There's 13 bytes of overhead in the SFTP protocol.) However, I wonder if that 
is correct. The SSH connection layer adds another 9 bytes to this.  If the 
intent was to ensure an SFTP data message is not split over two SSH channel 
packets, it seems to me the value should be -22.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org
For additional commands, e-mail: dev-h...@mina.apache.org



Re: [I] Wrong content while downloading files >200Mb [mina-sshd]

2024-02-26 Thread via GitHub


bferrenq commented on issue #468:
URL: https://github.com/apache/mina-sshd/issues/468#issuecomment-1963723363

   Hello @tomaswolf ,
   We have done a test as well by using 32768 buffer size instead of 32768 -13 
coming from 
[AbstractSftpClient](https://github.com/apache/mina-sshd/blob/de46a157c4ce9a9d070fd781d70530dad9141e50/sshd-sftp/src/main/java/org/apache/sshd/sftp/client/impl/AbstractSftpClient.java#L1217).
   
   Using that setting file is not corrupted.
   
   Do you know what is the rational behind this  above  "- 13"?
   
   Hope this helps your investigation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org
For additional commands, e-mail: dev-h...@mina.apache.org



Re: [I] Wrong content while downloading files >200Mb [mina-sshd]

2024-02-26 Thread via GitHub


bferrenq commented on issue #468:
URL: https://github.com/apache/mina-sshd/issues/468#issuecomment-1963661933

   [offset.zip](https://github.com/apache/mina-sshd/files/14402104/offset.zip)
   Hello Tomas,
   
   - You are right regarding offset_ko logs, I have uploaded correct ones that 
only contains 9 occurences of the unexpected length.
   - I have requested the server information, and will keep you posted once I 
have it,
   - I may be wrong, I would say problem is starting around offset 20971518 in 
our csv file.
   That csv row contains the begining of valid data then a value that does not 
exist in the original file, then unexpected LF.
   From there start indeed data duplication of several lines
   Then we jump to offset 41943040 where we have another corrupted row, then 
content duplication
   
   I still not have approval to share data or any kind of reproducer to ease 
investigation, thanks again for your analysis!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org
For additional commands, e-mail: dev-h...@mina.apache.org



Re: [I] Wrong content while downloading files >200Mb [mina-sshd]

2024-02-24 Thread via GitHub


tomaswolf commented on issue #468:
URL: https://github.com/apache/mina-sshd/issues/468#issuecomment-1962742141

   Can you please verify the locations (byte offsets) of the duplications. From 
your log, I would expect the first such duplication to be at offset 20995955: 
8320 bytes duplicated, i.e., followed by the same 8320 bytes beginning at 
offset 21004275.
   
   I don't see anything in our code that would cause the observed behavior. So 
my current assumption is still that this is some strange server-side bug. What 
worries me, though: according to your logs, the server is 
partnerupload.google.com. And they use dropbear and a buggy SFTP server? That 
sounds somewhat unlikely.
   
   With dropbear one uses normally either the OpenSSH SFTP server or the Green 
End server. I see nothing in their sources that would explain the observed 
behavior. Of course it's possible that the server uses something else still.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org
For additional commands, e-mail: dev-h...@mina.apache.org



Re: [I] Wrong content while downloading files >200Mb [mina-sshd]

2024-02-23 Thread via GitHub


tomaswolf commented on issue #468:
URL: https://github.com/apache/mina-sshd/issues/468#issuecomment-1961902386

   41804 - 32764 = 8320
   
   204500679 - 204425799 = 74880
   
   74880 / 8320 = 9
   
   The offset_ko.txt log seems to contain the result of three consecutive 
downloads? The first two failed or otherwise cut short? The last one (starting 
at 2024-02-23 14:35:26,153) has 9 times length=41804, so that matches up.
   
   And 8320 / 640 = 13. Mighty strange. Especially since 32755 + 13 = 32768 = 
32k. Coincidence?
   
   If you can tell or find out, I'd still very much like to know what SFTP 
server is being used with that dropbear.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org
For additional commands, e-mail: dev-h...@mina.apache.org



Re: [I] Wrong content while downloading files >200Mb [mina-sshd]

2024-02-23 Thread via GitHub


tomaswolf commented on issue #468:
URL: https://github.com/apache/mina-sshd/issues/468#issuecomment-1961875091

   I cannot answer your questions. I don't know what other buffer sizes these 
other SFTP clients used, or why it occurs every 640 requests. Perhaps the other 
clients might have worked because they used different buffer sizes?
   
   As for solving it client-side, I'm thinking of ignoring the extra data, and 
logging a warning.
   
   But I'll also have to double-check that this is not some strange bug in the 
lower layers in Apache MINA SSHD. Like, the server sending data correctly, but 
in the client somehow assembling packets to SFTP buffers goes wrong. Don't know 
yet what exactly is up.
   
   And yes, it is strange it occurs so regularly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org
For additional commands, e-mail: dev-h...@mina.apache.org



Re: [I] Wrong content while downloading files >200Mb [mina-sshd]

2024-02-23 Thread via GitHub


baiglin commented on issue #468:
URL: https://github.com/apache/mina-sshd/issues/468#issuecomment-1961830825

   That is great if this is indeed the reason! Really strange though why the 
server would do that on a regular pattern... 
   
   Something in the client would be perfect, in this case you should raise an 
exception or skip the extra data sent ? 
   
   2 questions:
   - Is it why when we increased the buffer size, it worked ? Also could you 
tell us more exactly what this buffer size is meant for ? I mean in the process 
of reading, where this buffer is used?
   - How do you think other client managed to work? 
   IN any case thanks a lot for the quick replies and help.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org
For additional commands, e-mail: dev-h...@mina.apache.org



Re: [I] Wrong content while downloading files >200Mb [mina-sshd]

2024-02-23 Thread via GitHub


tomaswolf commented on issue #468:
URL: https://github.com/apache/mina-sshd/issues/468#issuecomment-1961754170

   Why did I miss that? Anyway: no need for further logs. You found the reason. 
That's exactly what I wrote above: from code inspection, this can happen if the 
server returns _more_ data than requested, which it never must do. If it does, 
the server is broken.
   
   But we can do something about this in the client side.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org
For additional commands, e-mail: dev-h...@mina.apache.org



Re: [I] Wrong content while downloading files >200Mb [mina-sshd]

2024-02-23 Thread via GitHub


bferrenq commented on issue #468:
URL: https://github.com/apache/mina-sshd/issues/468#issuecomment-1961701848

   Hello,
   You are correct, file size is supposed to be 204425799 whereas it is 
204500679, and unless I did a mistake, logs are for the corrupted file.
   
   Regarding streaming, we simply taking Mina's Inpustream then write in a 
local file. I may be wrong, still the fact that tweaking mentioned parameter 
makes file correctly fetched, so this should discard problem on that part, no?
   
   Then file seems corrupted after 2024-02-23 14:13:33,806, so:
id=744, offset=20963200, length=32755
   
   I do not know if this is expected, please notice below the "- len=41084" (19 
times in the file)
   
2024-02-23 14:13:33,816 TRACE o.a.s.s.c.i.SftpInputStreamAsync [main] 
pollBuffer(SftpInputStreamAsync[ClientSessionImpl[tim-amad...@partnerupload.google.com/64.233.167.130:19321]][./market_reference_data_2024-02-21.csv])
 response=103 for ack=SftpAckData[id=744, offset=20963200, length=32755] - 
len=41084

whereas most of the time we have "- len=32764" (13k times):

2024-02-23 14:13:33,799 TRACE o.a.s.s.c.i.SftpInputStreamAsync [main] 
pollBuffer(SftpInputStreamAsync[ClientSessionImpl[u...@partnerupload.google.com/64.233.167.130:22]][./data.csv])
 response=103 for ack=SftpAckData[id=743, offset=20930445, length=32755] - 
len=32764
   
   Before providing full logs, I need provider consent, request ongoing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org
For additional commands, e-mail: dev-h...@mina.apache.org



Re: [I] Wrong content while downloading files >200Mb [mina-sshd]

2024-02-23 Thread via GitHub


tomaswolf commented on issue #468:
URL: https://github.com/apache/mina-sshd/issues/468#issuecomment-1961616466

   Something strange is going on at the end; the last request (id=6346) should 
be sent for offset 204423955 + 1842, not +32755. That looks like a bug, but it 
doesn't explain duplication inside the file. At what byte offsets do these 
duplications occur?
   
   Given that log I'd expect a file of 204425799 bytes, which is what you wrote 
it should be. Are you sure that is a log of a case gone wrong? If so, the 
problem cannot be in receiving the data but somewhere in later processing in 
the stream.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org
For additional commands, e-mail: dev-h...@mina.apache.org



Re: [I] Wrong content while downloading files >200Mb [mina-sshd]

2024-02-23 Thread via GitHub


bferrenq commented on issue #468:
URL: https://github.com/apache/mina-sshd/issues/468#issuecomment-1961488794

   
[offset-ko.zip](https://github.com/apache/mina-sshd/files/14386589/offset-ko.zip)
   Indeed ko-logs in trace mode are huge, 800Mb compressed...
   I will see how to expose them, in the between I did a grep on the log file 
around SftpInputStreamAsync that contains offset and length, I hope this can 
already help.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org
For additional commands, e-mail: dev-h...@mina.apache.org



Re: [I] Wrong content while downloading files >200Mb [mina-sshd]

2024-02-23 Thread via GitHub


tomaswolf commented on issue #468:
URL: https://github.com/apache/mina-sshd/issues/468#issuecomment-1961161378

   I need the log with log level TRACE. Only that will log precise offsets and 
lengths of data received.
   
   I also will need byte offsets. Line numbers won't help me.
   
   If you're worried that trace logging might reveal sensitive data (buffer 
contents), feel free to tell me via private e-mail (on the address I use in the 
git commits) how and where to fetch the logs. They'll probably be too large to 
attach here anyway.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org
For additional commands, e-mail: dev-h...@mina.apache.org



Re: [I] Wrong content while downloading files >200Mb [mina-sshd]

2024-02-23 Thread via GitHub


bferrenq commented on issue #468:
URL: https://github.com/apache/mina-sshd/issues/468#issuecomment-1961126065

   Hi,
   Attached ko logs are associated to a csv file that contains 5848831 lines, 
that are <50 chars long.
   Every #595317 lines (so #600K lines), the last #210 lines are duplicated. So 
it seems there is a kind of pattern ie not random behavior.
   
   Benoit
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org
For additional commands, e-mail: dev-h...@mina.apache.org



Re: [I] Wrong content while downloading files >200Mb [mina-sshd]

2024-02-23 Thread via GitHub


bferrenq commented on issue #468:
URL: https://github.com/apache/mina-sshd/issues/468#issuecomment-1960909713

   
[logs-ok-ko.zip](https://github.com/apache/mina-sshd/files/14383010/logs-ok-ko.zip)
   Hello @tomaswolf  and thx for you quick reply!
   
   One **major fact** I forgot to mention, using linux sftp or 
[paramiko](https://ssh-paramiko.readthedocs.io/en/master/) , downloaded file is 
as expected.
   
   Still I also do think it seems related to server behavior, now given native 
sftp is working fine, I do not know what to ask to that provider.
   
   In order to help investigation, as I could not yet find a reproducer using 
for instance SftpTransferTest, I have attached logs:
   
   -  when file size is fine, 204,425,799 bytes
   -  and when file size is bigger than expected, 204,500,679 bytes
   
I will try to find the precise offset, I try to remember it happen in the 
middle of the file, bunch of lines are duplicated.
   
   Thanks again for your support on that issue puzzling me!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org
For additional commands, e-mail: dev-h...@mina.apache.org



Re: [I] Wrong content while downloading files >200Mb [mina-sshd]

2024-02-22 Thread via GitHub


tomaswolf commented on issue #468:
URL: https://github.com/apache/mina-sshd/issues/468#issuecomment-1960124555

   AFAIK dropbear does not have SFTP, so some external SFTP is used. Either 
OpenSSH, or some other implementation. Do you have more information on that 
server?
   
   Since it occurs apparently only with that particular server my first guess 
would be that this is a server problem. Though it is not impossible that it 
might be some bug in Apache MINA SSHD that is triggered only in very specific 
circumstances that just happen to occur only with this server...
   
   Tweaking parameters or the buffer size should not be necessary; looks to me 
that this tweaking just happens to avoid the precise circumstances that trigger 
this bug, wherever it might be, in your case.
   
   Absent a reproducer that we could live debug, this would need a full trace 
log, plus information about the file content (at what offsets exactly do 
duplications occur? How long are the duplications? Expected/actual file size?) 
The trace log should have information about all blocks received, and their 
offsets and lengths. But that log might be huge.
   
   From code inspection I see only one possibility for something like this to 
occur: if the server returns _more_ data than the client requested for a chunk. 
But that should never happen; such a server would violate the SFTP protocol.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org
For additional commands, e-mail: dev-h...@mina.apache.org